Predicting Liver Disease Using Machine Learning: A Data-Driven Approach¶
Problem Statement¶
Liver diseases are a significant global health concern, affecting millions of people and leading to severe complications if not diagnosed early. Traditional diagnostic methods rely heavily on laboratory tests and clinical expertise, which may be time-consuming and require specialized resources. There is a need for an efficient, data-driven approach to predict liver disease accurately using patient records. This study aims to leverage machine learning models to develop a predictive system that can assist healthcare professionals in diagnosing liver disease based on clinical and demographic data.
Context¶
Liver disease is a broad term that includes conditions such as fatty liver, hepatitis, cirrhosis, and liver cancer. These diseases are influenced by various factors, including alcohol consumption, viral infections, metabolic disorders, and genetic predisposition. Early detection and timely intervention can significantly improve patient outcomes.
This dataset, comprising 30,691 patient records with 11 clinical features, provides an opportunity to develop an automated predictive model. The dataset includes laboratory test results, demographic information, and other biomarkers crucial for detecting liver abnormalities. By applying machine learning techniques, we can enhance diagnostic accuracy, reduce misclassification rates, and support medical professionals in clinical decision-making.
Objective¶
- Develop a binary classification model to predict whether a patient has liver disease (1) or not (0).
- Explore various machine learning algorithms to determine the most effective model for liver disease prediction.
- Analyze feature importance to identify the key clinical markers contributing to liver disease detection.
- Improve predictive accuracy using feature engineering, hyperparameter tuning, and ensemble learning techniques.
- Provide a framework for deploying a real-world decision support system that can assist healthcare professionals in diagnosing liver disease efficiently.
Dataset¶
Liver Disease Patient Dataset 30K train data
- archive.zip
- Age Age of the patient
- Gender Gender of the patient
- TB Total Bilirubin
- Bilirubin is a yellow pigment formed during red blood cell breakdown.
- High levels indicate potential liver dysfunction or bile duct obstruction.
- DB Direct Bilirubin
- A fraction of total bilirubin that is water-soluble.
- Elevated direct bilirubin suggests obstructive jaundice or hepatitis.
- Alkphos Alkaline Phosphotase
- An enzyme found in the liver, bones, and bile ducts.
- High ALP levels may indicate cholestasis (bile blockage), liver disease, or bone disorders.
- Sgpt Alamine Aminotransferase
- An enzyme found in liver cells.
- Elevated ALT levels suggest liver cell damage (hepatitis, fatty liver, or alcohol-related liver disease).
- Sgot Aspartate Aminotransferase
- Another enzyme in the liver and muscles.
- High SGOT levels indicate liver or muscle damage.
- TP Total Protiens
- Sum of albumin & globulin proteins in the blood.
- Low protein levels may indicate malnutrition, liver, or kidney disease.
- ALB Albumin
- A protein made by the liver that helps maintain blood volume and transport nutrients.
- Low albumin is a marker of chronic liver disease, malnutrition, or kidney disorders.
- A/G Ratio Albumin and Globulin Ratio
- Measures the balance between albumin & globulin proteins.
- Low A/G ratio can indicate chronic liver disease, autoimmune disorders, or inflammation.
- Result
Selector field used to split the data into two sets (labeled by the experts)
- 1 Liver Patient → Patients diagnosed with liver disease.
- 2 Non-Liver Patient → Patients without liver disease.
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Import Libraries¶
Run, restart kernel, comment out this cell, then run all
# # Uninstall conflicting packages completely
# !pip uninstall -y numpy scikit-learn imbalanced-learn scikeras tensorflow torch matplotlib seaborn pandas scipy dask-ml
# # Upgrade pip, setuptools, and wheel
# !pip install --upgrade pip setuptools wheel
# # Purge pip cache to remove broken package installs
# !pip cache purge
# # Install compatible versions of required packages
# !pip install --no-cache-dir numpy==1.26.4 scikit-learn==1.4.2 imbalanced-learn==0.13.0 \
# scikeras==0.13.0 tensorflow==2.18.0 torch==2.6.0 torchvision \
# torchaudio matplotlib==3.7.1 seaborn pandas scipy fastai dask-ml
# # Install any additional dependencies for imbalanced-learn (SMOTE)
# !pip install --no-cache-dir imbalanced-learn
# # 🚨 STOP HERE 🚨
# print("\n⚠️ Restart the notebook kernel NOW before running anything else.")
Run next cell
# ---------------------------------
# After restarting, run the following:
# ---------------------------------
# Import essential libraries and verify versions
import numpy as np
import sklearn
import tensorflow as tf
import torch
import imblearn
import scikeras
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import scipy
import dask_ml
from tensorflow.keras import backend
# Fixing the seed for random number generators to ensure reproducibility
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
# Suppress warnings for a clean output
import warnings
warnings.filterwarnings("ignore")
# ✅ Verify Installed Versions
print("\n✅ Installed Versions:")
print("NumPy:", np.__version__)
print("SciKeras:", scikeras.__version__)
print("Scikit-Learn:", sklearn.__version__)
print("TensorFlow:", tf.__version__)
print("PyTorch:", torch.__version__)
print("Imbalanced-Learn:", imblearn.__version__)
print("Matplotlib:", matplotlib.__version__)
print("Seaborn:", sns.__version__)
print("Pandas:", pd.__version__)
print("Scipy:", scipy.__version__)
print("Dask-ML:", dask_ml.__version__)
✅ Installed Versions: NumPy: 1.26.4 SciKeras: 0.13.0 Scikit-Learn: 1.4.2 TensorFlow: 2.18.0 PyTorch: 2.6.0+cu124 Imbalanced-Learn: 0.13.0 Matplotlib: 3.7.1 Seaborn: 0.13.2 Pandas: 2.2.3 Scipy: 1.15.2 Dask-ML: 2024.4.4
# pip check
# !pip install scikeras
# import pandas as pd
# import numpy as np
# from sklearn.model_selection import train_test_split
# from sklearn.preprocessing import LabelEncoder, OneHotEncoder
# from sklearn import model_selection
# from sklearn.compose import ColumnTransformer
# import matplotlib.pyplot as plt
# import seaborn as sns
# from sklearn.impute import SimpleImputer
# import warnings
# from sklearn.metrics import confusion_matrix
# from sklearn.pipeline import Pipeline
# from sklearn.model_selection import GridSearchCV
# from sklearn.model_selection import RandomizedSearchCV
# import tensorflow as tf # deep learning
# from tensorflow.keras.models import Sequential
# from tensorflow.keras.layers import Dense
# from tensorflow.keras.layers import Dense, Input, Dropout,BatchNormalization
# from scikeras.wrappers import KerasClassifier
# import random
# from tensorflow.keras import backend
# random.seed(1)
# np.random.seed(1)
# tf.random.set_seed(1)
# warnings.filterwarnings("ignore")
Loading the Data¶
Unzip
import zipfile
# Define the path to your zip file
zip_path = "/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/archive.zip"
# Define your chosen extraction directory
extract_to = "/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/" # Replace with your desired directory
# Extract the files to the specified directory
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
zip_ref.extractall(extract_to)
print(f"Files extracted to: {extract_to}")
Files extracted to: /content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/
Load
import pandas as pd
# Correctly load the CSV file with proper encoding
df_train = pd.read_csv(
"/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/Liver Patient Dataset (LPD)_train.csv",
encoding="ISO-8859-1"
)
# Correctly load the Excel file
df_test = pd.read_excel(
"/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/test.csv.xlsx"
)
df_train.shape, df_test.shape
((30691, 11), (2109, 10))
df_train.columns.to_list()
['Age of the patient', 'Gender of the patient', 'Total Bilirubin', 'Direct Bilirubin', '\xa0Alkphos Alkaline Phosphotase', '\xa0Sgpt Alamine Aminotransferase', 'Sgot Aspartate Aminotransferase', 'Total Protiens', '\xa0ALB Albumin', 'A/G Ratio Albumin and Globulin Ratio', 'Result']
df_test.columns.to_list()
[66, 'Female', 0.9, 0.2, 210, 35, 32, 8, 3.9, '0.9.1']
df_train.head()
| Age of the patient | Gender of the patient | Total Bilirubin | Direct Bilirubin | Alkphos Alkaline Phosphotase | Sgpt Alamine Aminotransferase | Sgot Aspartate Aminotransferase | Total Protiens | ALB Albumin | A/G Ratio Albumin and Globulin Ratio | Result | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 65.0 | Female | 0.7 | 0.1 | 187.0 | 16.0 | 18.0 | 6.8 | 3.3 | 0.90 | 1 |
| 1 | 62.0 | Male | 10.9 | 5.5 | 699.0 | 64.0 | 100.0 | 7.5 | 3.2 | 0.74 | 1 |
| 2 | 62.0 | Male | 7.3 | 4.1 | 490.0 | 60.0 | 68.0 | 7.0 | 3.3 | 0.89 | 1 |
| 3 | 58.0 | Male | 1.0 | 0.4 | 182.0 | 14.0 | 20.0 | 6.8 | 3.4 | 1.00 | 1 |
| 4 | 72.0 | Male | 3.9 | 2.0 | 195.0 | 27.0 | 59.0 | 7.3 | 2.4 | 0.40 | 1 |
df_test.head()
| 66 | Female | 0.9 | 0.2 | 210 | 35 | 32 | 8 | 3.9 | 0.9.1 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 50 | Male | 9.4 | 5.2 | 268 | 21 | 63 | 6.4 | 2.8 | 0.8 |
| 1 | 42 | Female | 3.5 | 1.6 | 298 | 68 | 200 | 7.1 | 3.4 | 0.9 |
| 2 | 65 | Male | 1.7 | 0.8 | 315 | 12 | 38 | 6.3 | 2.1 | 0.5 |
| 3 | 22 | Male | 3.3 | 1.5 | 214 | 54 | 152 | 5.1 | 1.8 | 0.5 |
| 4 | 31 | Female | 1.1 | 0.3 | 138 | 14 | 21 | 7.0 | 3.8 | 1.1 |
- Unknown column names in test data, will not use
Data Overview¶
data = df_train.copy()
data.shape
(30691, 11)
data.head(5)
| Age of the patient | Gender of the patient | Total Bilirubin | Direct Bilirubin | Alkphos Alkaline Phosphotase | Sgpt Alamine Aminotransferase | Sgot Aspartate Aminotransferase | Total Protiens | ALB Albumin | A/G Ratio Albumin and Globulin Ratio | Result | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 65.0 | Female | 0.7 | 0.1 | 187.0 | 16.0 | 18.0 | 6.8 | 3.3 | 0.90 | 1 |
| 1 | 62.0 | Male | 10.9 | 5.5 | 699.0 | 64.0 | 100.0 | 7.5 | 3.2 | 0.74 | 1 |
| 2 | 62.0 | Male | 7.3 | 4.1 | 490.0 | 60.0 | 68.0 | 7.0 | 3.3 | 0.89 | 1 |
| 3 | 58.0 | Male | 1.0 | 0.4 | 182.0 | 14.0 | 20.0 | 6.8 | 3.4 | 1.00 | 1 |
| 4 | 72.0 | Male | 3.9 | 2.0 | 195.0 | 27.0 | 59.0 | 7.3 | 2.4 | 0.40 | 1 |
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 30691 entries, 0 to 30690 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Age of the patient 30689 non-null float64 1 Gender of the patient 29789 non-null object 2 Total Bilirubin 30043 non-null float64 3 Direct Bilirubin 30130 non-null float64 4 Alkphos Alkaline Phosphotase 29895 non-null float64 5 Sgpt Alamine Aminotransferase 30153 non-null float64 6 Sgot Aspartate Aminotransferase 30229 non-null float64 7 Total Protiens 30228 non-null float64 8 ALB Albumin 30197 non-null float64 9 A/G Ratio Albumin and Globulin Ratio 30132 non-null float64 10 Result 30691 non-null int64 dtypes: float64(9), int64(1), object(1) memory usage: 2.6+ MB
- 30,691 Observatons
- 11 columns
data.dtypes.value_counts()
| count | |
|---|---|
| float64 | 9 |
| object | 1 |
| int64 | 1 |
Check for duplicated data
data.duplicated().sum()
11323
Check for null data
data.isnull().sum()
| 0 | |
|---|---|
| Age of the patient | 2 |
| Gender of the patient | 902 |
| Total Bilirubin | 648 |
| Direct Bilirubin | 561 |
| Alkphos Alkaline Phosphotase | 796 |
| Sgpt Alamine Aminotransferase | 538 |
| Sgot Aspartate Aminotransferase | 462 |
| Total Protiens | 463 |
| ALB Albumin | 494 |
| A/G Ratio Albumin and Globulin Ratio | 559 |
| Result | 0 |
data.isnull().sum().sum()
5425
# Let's check for missing values in the data
round(data.isnull().sum() / data.isnull().count() * 100, 2) # calculates the percentage of missing values in each column of the DataFrame
| 0 | |
|---|---|
| Age of the patient | 0.01 |
| Gender of the patient | 2.94 |
| Total Bilirubin | 2.11 |
| Direct Bilirubin | 1.83 |
| Alkphos Alkaline Phosphotase | 2.59 |
| Sgpt Alamine Aminotransferase | 1.75 |
| Sgot Aspartate Aminotransferase | 1.51 |
| Total Protiens | 1.51 |
| ALB Albumin | 1.61 |
| A/G Ratio Albumin and Globulin Ratio | 1.82 |
| Result | 0.00 |
Get the proportion of unique values in the "Target" column
# get the proportion of unique values in the "Target" column
data["Result"].value_counts(0), data["Result"].value_counts(1)
(Result 1 21917 2 8774 Name: count, dtype: int64, Result 1 0.714118 2 0.285882 Name: proportion, dtype: float64)
1 Liver Patient
2 Non Liver Patient
Not balanced, will be hard to predict
data.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Age of the patient | 30689.0 | 44.107205 | 15.981043 | 4.0 | 32.0 | 45.0 | 55.0 | 90.0 |
| Total Bilirubin | 30043.0 | 3.370319 | 6.255522 | 0.4 | 0.8 | 1.0 | 2.7 | 75.0 |
| Direct Bilirubin | 30130.0 | 1.528042 | 2.869592 | 0.1 | 0.2 | 0.3 | 1.3 | 19.7 |
| Alkphos Alkaline Phosphotase | 29895.0 | 289.075364 | 238.537589 | 63.0 | 175.0 | 209.0 | 298.0 | 2110.0 |
| Sgpt Alamine Aminotransferase | 30153.0 | 81.488641 | 182.158850 | 10.0 | 23.0 | 35.0 | 62.0 | 2000.0 |
| Sgot Aspartate Aminotransferase | 30229.0 | 111.469979 | 280.851078 | 10.0 | 26.0 | 42.0 | 88.0 | 4929.0 |
| Total Protiens | 30228.0 | 6.480237 | 1.081980 | 2.7 | 5.8 | 6.6 | 7.2 | 9.6 |
| ALB Albumin | 30197.0 | 3.130142 | 0.792281 | 0.9 | 2.6 | 3.1 | 3.8 | 5.5 |
| A/G Ratio Albumin and Globulin Ratio | 30132.0 | 0.943467 | 0.323164 | 0.3 | 0.7 | 0.9 | 1.1 | 2.8 |
| Result | 30691.0 | 1.285882 | 0.451841 | 1.0 | 1.0 | 1.0 | 2.0 | 2.0 |
- The dataset contains 30,689 records for "Age of the Patient."
- Other medical attributes such as bilirubin levels, enzyme levels, and protein ratios have slightly fewer records, indicating some missing values.
- The Result column likely represents a binary classification outcome (1 or 2).
Outliers & Variability
- Features like Sgpt, Sgot, and Bilirubin levels show extreme max values. Consider handling outliers through log transformations or winsorization.
Missing Data
- Some features have missing values (e.g., Total Bilirubin, ALB Albumin).
- Use imputation techniques (mean/median imputation or predictive modeling) to fill gaps.
Feature Importance
- "A/G Ratio" and "Bilirubin" are crucial indicators of liver disease.
- Consider correlation analysis and feature selection before modeling.
Class Imbalance Check
- The Result column (1 or 2) should be analyzed for class distribution.
- If imbalanced, consider SMOTE (Synthetic Minority Over-sampling Technique) or class weighting in models.
Unique Values
data.nunique()
| 0 | |
|---|---|
| Age of the patient | 77 |
| Gender of the patient | 2 |
| Total Bilirubin | 113 |
| Direct Bilirubin | 80 |
| Alkphos Alkaline Phosphotase | 263 |
| Sgpt Alamine Aminotransferase | 152 |
| Sgot Aspartate Aminotransferase | 177 |
| Total Protiens | 58 |
| ALB Albumin | 40 |
| A/G Ratio Albumin and Globulin Ratio | 69 |
| Result | 2 |
data['Gender of the patient'].value_counts()
| count | |
|---|---|
| Gender of the patient | |
| Male | 21986 |
| Female | 7803 |
EDA¶
Univariate Analysis¶
# Function to plot a boxplot and a histogram along the same scale.
def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (12,7))
kde: whether to the show density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a star will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
histogram_boxplot(data, "Age of the patient")
- Majority of patients are between 30-60 years old, with the highest concentration near 40-45 years.
- Few elderly patients (80+), but they appear as mild outliers.
- The second peak at 55+ suggests an additional age cluster, possibly due to different medical conditions or risk factors.
histogram_boxplot(data, "Total Bilirubin")
- Most patients have normal bilirubin levels (below 2.7), but there are extreme cases.
- Strong right-skewness suggests abnormal values that could be from severe medical conditions.
histogram_boxplot(data, "Direct Bilirubin")
- Most patients have normal direct bilirubin levels (below 2.0), but extreme cases exist.
- A strong right-skewed distribution with high outliers suggests severe liver disease.
histogram_boxplot(data, "\xa0Alkphos Alkaline Phosphotase")
- Most patients have ALP levels between 100-400 U/L, but there are extreme cases.
- Strong right-skewed distribution suggests abnormally high ALP values in some patients.
histogram_boxplot(data, "\xa0Sgpt Alamine Aminotransferase")
- Most patients have SGPT levels below 120 U/L, but some extreme cases exceed 2,000 U/L.
- Strong right-skewed distribution suggests a large number of high outliers.
histogram_boxplot(data, "Sgot Aspartate Aminotransferase")
- Most patients have SGOT levels below 100 U/L, but some extreme cases exceed 5,000 U/L.
- Strong right-skewed distribution with many high outliers.
histogram_boxplot(data, "Total Protiens")
- Most patients have total protein levels between 6.0 - 7.5 g/dL, aligning with normal protein range.
- Slight right skewness, but the distribution is mostly normal.
histogram_boxplot(data, "\xa0ALB Albumin")
- Most patients have albumin levels between 2.5 - 3.8 g/dL, aligning with normal protein range.
- Slight right skewness, but the distribution is mostly normal.
histogram_boxplot(data, "A/G Ratio Albumin and Globulin Ratio")
- Most patients have an A/G ratio between 0.7 - 1.1, which is within the normal range.
- Slight right skewness, but the distribution is mostly normal.
histogram_boxplot(data, "Result")
- "Result" is a binary target variable (1 vs. 2).
- Class 1 dominates, meaning the dataset is imbalanced.
# Function to create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 1, 5))
else:
plt.figure(figsize=(n + 1, 5))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
labeled_barplot(data, "Gender of the patient")
# Ensure the "Result" column exists before plotting
if "Result" in data.columns:
# Count values in the "Result" column
result_counts = data["Result"].value_counts()
# Labels for the pie chart
labels = ["Liver Patient", "Non-Liver Patient"]
# Sizes for the pie chart (count of each category)
sizes = [result_counts[1], result_counts[2]]
# Explode effect for better visualization
explode = (0, 0.1)
# Create the pie chart
fig, ax = plt.subplots(figsize=(10, 8))
ax.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
shadow=True, startangle=90, colors=['#ff9999','#66b3ff'])
ax.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle
plt.title("Proportion of Liver and Non-Liver Patients", size=20)
plt.show()
else:
print("The column 'Result' is not found in the dataset.")
Bivariate Analysis¶
### Function to plot distributions
def distribution_plot_wrt_target(data, predictor, target):
fig, axs = plt.subplots(2, 2, figsize=(12, 10))
target_uniq = data[target].unique()
axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
sns.histplot(
data=data[data[target] == target_uniq[0]],
x=predictor,
kde=True,
ax=axs[0, 0],
color="teal",
)
axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
sns.histplot(
data=data[data[target] == target_uniq[1]],
x=predictor,
kde=True,
ax=axs[0, 1],
color="orange",
)
axs[1, 0].set_title("Boxplot w.r.t target")
sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")
axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
sns.boxplot(
data=data,
x=target,
y=predictor,
ax=axs[1, 1],
showfliers=False,
palette="gist_rainbow",
)
plt.tight_layout()
plt.show()
distribution_plot_wrt_target(data, "Gender of the patient", "Result")
- More male patients have liver disease (Target = 1) compared to females.
- Non-liver patients (Target = 2) have a more balanced gender distribution, but males still dominate.
- Boxplots confirm gender differences, showing that males are more affected by liver disease.
- The distribution suggests a potential gender-based risk factor for liver disease.
distribution_plot_wrt_target(data, "Age of the patient", "Result")
- Age distribution is similar for both liver and non-liver patients, peaking around 40-50 years.
- Liver disease patients (Target = 1) show a slightly wider spread, with more cases at younger and older ages.
- Boxplots confirm that median age is similar, but liver disease cases have slightly more extreme outliers.
- No significant age-based distinction, suggesting age alone may not be a strong predictor of liver disease.
distribution_plot_wrt_target(data, "Total Bilirubin", "Result")
- Liver patients (Target = 1) have significantly higher Total Bilirubin levels than non-liver patients.
- Strong right-skewed distribution in both groups, with extreme outliers in liver disease cases (values exceeding 70).
- Boxplots confirm a higher median bilirubin level in liver disease patients, with a much wider spread.
- Non-liver patients mostly have bilirubin levels below 2, while liver patients show a much broader range.
distribution_plot_wrt_target(data, "Direct Bilirubin", "Result")
- Liver patients (Target = 1) have significantly higher Direct Bilirubin levels than non-liver patients.
- Strong right-skewed distribution in both groups, with extreme outliers in liver disease cases (values exceeding 15).
- Boxplots confirm a higher median Direct Bilirubin level in liver disease patients, with a much wider spread.
- Non-liver patients mostly have Direct Bilirubin levels below 0.5, while liver patients show a much broader range.
distribution_plot_wrt_target(data, "\xa0Alkphos Alkaline Phosphotase", "Result")
- Liver patients (Target = 1) have generally higher ALP levels compared to non-liver patients.
- Strong right-skewed distribution, with extreme outliers above 2000 U/L in liver disease cases.
- Boxplots show a higher median ALP level in liver patients, with a wider interquartile range (IQR).
- Non-liver patients mostly have ALP levels below 250, while liver patients exhibit a much broader spread.
distribution_plot_wrt_target(data, "\xa0Sgpt Alamine Aminotransferase", "Result")
- Liver patients (Target = 1) have significantly higher SGPT (ALT) levels than non-liver patients.
- Strong right-skewed distribution, with extreme outliers exceeding 1500 U/L in liver disease cases.
- Boxplots confirm a higher median SGPT level in liver disease patients, with a much wider interquartile range (IQR).
- Non-liver patients mostly have SGPT levels below 50, while liver patients exhibit a broader range with high variability.
distribution_plot_wrt_target(data, "Sgot Aspartate Aminotransferase", "Result")
- Liver patients (Target = 1) have significantly higher SGOT (AST) levels compared to non-liver patients.
- Strong right-skewed distribution, with extreme outliers exceeding 4000 U/L in liver disease cases.
- Boxplots confirm that median SGOT levels are much higher in liver patients, with a wider interquartile range (IQR).
- Non-liver patients mostly have SGOT levels below 50, while liver patients show a broader range with high variability.
distribution_plot_wrt_target(data, "Total Protiens", "Result")
- Total Protein distribution is similar for both liver and non-liver patients, showing a near-normal distribution.
- Slightly lower total protein levels in liver disease patients (Target = 1) compared to non-liver patients (Target = 2).
- Boxplots confirm a small difference in median values, but with overlapping interquartile ranges (IQR).
- Outliers exist in both groups, but no extreme differences, indicating Total Proteins alone is not a strong differentiator for liver disease.
distribution_plot_wrt_target(data, "\xa0ALB Albumin", "Result")
- Non-liver patients (Target = 2) generally have higher Albumin levels compared to liver patients (Target = 1).
- Liver disease patients show a slightly left-skewed distribution, indicating lower albumin levels on average.
- Boxplots confirm a lower median Albumin level in liver patients, with less variability compared to non-liver patients.
- Albumin levels could be a useful indicator of liver function, but some overlap exists between the two groups.
distribution_plot_wrt_target(data, "A/G Ratio Albumin and Globulin Ratio", "Result")
- Non-liver patients (Target = 2) have higher A/G ratios on average than liver patients (Target = 1).
- Liver disease patients exhibit a left-skewed distribution, indicating a tendency for lower A/G ratios.
- Boxplots confirm that the median A/G ratio is lower in liver disease cases, with more outliers at the lower end.
- A/G ratio may be a useful predictor of liver disease, but some overlap exists between the groups.
# function to plot stacked bar chart
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 1, 5))
plt.legend(
loc="lower left",
frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
stacked_barplot(data, "Age of the patient", "Result")
Result 1 2 All Age of the patient All 21915 8774 30689 45.0 1033 434 1467 42.0 887 398 1285 60.0 899 378 1277 50.0 936 372 1308 ... ... ... ... 80.0 5 4 9 84.0 12 3 15 77.0 8 2 10 89.0 0 2 2 83.0 1 0 1 [78 rows x 3 columns] ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "Gender of the patient", "Result")
Result 1 2 All Gender of the patient All 21295 8494 29789 Male 15742 6244 21986 Female 5553 2250 7803 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "Total Bilirubin", "Result")
Result 1 2 All Total Bilirubin All 21483 8560 30043 0.7 2014 1915 3929 0.8 2886 1702 4588 0.9 1926 947 2873 0.6 1492 892 2384 ... ... ... ... 6.2 52 0 52 5.9 48 0 48 5.7 51 0 51 5.5 60 0 60 7.4 50 0 50 [114 rows x 3 columns] ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "Direct Bilirubin", "Result")
Result 1 2 All Direct Bilirubin All 21536 8594 30130 0.2 5623 4185 9808 0.1 2037 1222 3259 0.3 1642 1047 2689 0.6 406 396 802 ... ... ... ... 5.2 59 0 59 5.5 53 0 53 5.6 51 0 51 6.0 55 0 55 4.6 48 0 48 [81 rows x 3 columns] ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "\xa0Alkphos Alkaline Phosphotase", "Result")
Result 1 2 All Alkphos Alkaline Phosphotase All 21354 8541 29895 145.0 149 305 454 180.0 201 293 494 165.0 153 267 420 158.0 255 242 497 ... ... ... ... 272.0 254 0 254 276.0 51 0 51 280.0 98 0 98 282.0 401 0 401 263.0 107 0 107 [264 rows x 3 columns] ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "\xa0Sgpt Alamine Aminotransferase", "Result")
Result 1 2 All Sgpt Alamine Aminotransferase All 21560 8593 30153 18.0 411 451 862 22.0 544 433 977 28.0 451 413 864 32.0 198 390 588 ... ... ... ... 97.0 52 0 52 96.0 105 0 105 95.0 110 0 110 94.0 50 0 50 93.0 53 0 53 [153 rows x 3 columns] ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "Sgot Aspartate Aminotransferase", "Result")
Result 1 2 All Sgot Aspartate Aminotransferase All 21590 8639 30229 23.0 303 519 822 21.0 189 510 699 29.0 240 315 555 28.0 370 310 680 ... ... ... ... 126.0 51 0 51 125.0 109 0 109 116.0 51 0 51 114.0 49 0 49 150.0 53 0 53 [178 rows x 3 columns] ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "Total Protiens", "Result")
Result 1 2 All Total Protiens All 21586 8642 30228 7.0 1190 481 1671 6.0 1067 468 1535 6.1 504 407 911 8.0 663 365 1028 7.3 582 364 946 6.8 1170 330 1500 5.9 401 328 729 5.2 305 313 618 7.1 853 303 1156 6.9 1020 300 1320 6.7 481 293 774 5.5 621 267 888 7.9 465 264 729 7.2 844 262 1106 6.2 958 258 1216 7.4 351 253 604 6.4 710 250 960 6.5 532 249 781 8.2 189 205 394 6.3 516 205 721 5.6 711 202 913 4.9 167 166 333 5.8 572 164 736 7.8 312 164 476 5.1 379 163 542 6.6 666 159 825 5.3 359 145 504 7.6 323 145 468 4.5 97 117 214 3.9 0 108 108 7.5 684 99 783 5.7 442 99 541 8.5 143 97 240 5.4 575 95 670 8.4 60 91 151 9.2 59 58 117 4.6 146 54 200 3.7 0 52 52 4.8 108 52 160 5.0 511 52 563 7.7 98 49 147 3.8 50 49 99 8.1 262 48 310 8.3 96 48 144 4.3 175 1 176 4.0 101 0 101 4.1 99 0 99 3.6 169 0 169 3.0 49 0 49 4.4 214 0 214 4.7 99 0 99 8.6 151 0 151 8.7 48 0 48 8.9 49 0 49 2.8 46 0 46 9.5 47 0 47 9.6 48 0 48 2.7 49 0 49 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "\xa0ALB Albumin", "Result")
Result 1 2 All ALB Albumin All 21561 8636 30197 3.0 1777 581 2358 2.9 963 509 1472 3.5 699 475 1174 3.2 884 474 1358 4.0 1453 465 1918 3.9 851 433 1284 4.1 413 414 827 4.2 207 398 605 3.1 1119 362 1481 3.8 400 353 753 3.6 591 352 943 3.7 701 350 1051 2.3 325 311 636 3.3 772 311 1083 4.4 110 295 405 2.6 876 264 1140 2.5 979 253 1232 2.2 424 214 638 4.3 485 203 688 2.7 1120 162 1282 2.8 761 154 915 4.5 146 145 291 1.9 250 114 364 3.4 973 111 1084 1.6 311 108 419 2.4 810 108 918 2.0 991 108 1099 4.6 96 107 203 1.4 50 105 155 2.1 635 100 735 1.8 552 55 607 1.7 113 52 165 5.0 0 49 49 4.7 90 48 138 4.8 55 47 102 4.9 157 46 203 1.0 59 0 59 1.5 162 0 162 5.5 96 0 96 0.9 105 0 105 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "A/G Ratio Albumin and Globulin Ratio", "Result")
Result 1 2 All A/G Ratio Albumin and Globulin Ratio All 21547 8585 30132 1.0 3823 1600 5423 0.9 1905 1153 3058 1.2 831 974 1805 1.1 1641 721 2362 ... ... ... ... 1.09 59 0 59 1.11 58 0 58 1.12 49 0 49 0.53 48 0 48 0.3 226 0 226 [70 rows x 3 columns] ------------------------------------------------------------------------------------------------------------------------
# Select only numerical columns
numeric_data = data.select_dtypes(include=[np.number])
# Compute correlation matrix
correlation_matrix = numeric_data.corr()
# Create heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Feature Correlation Heatmap")
plt.show()
EDA Analysis¶
- The dataset is imbalanced, with significantly more liver patients than non-liver patients.
- Some features have extreme values (outliers), especially Total Bilirubin Direct Bilirubin, SGPT, SGOT, and Alkaline Phosphatase.
- Features such as Total Proteins, Albumin, and A/G Ratio have more normally distributed values.
- Males are more represented than females.
- There is a higher proportion of liver disease in males compared to females.
- No significant difference in age distribution between liver and non-liver patients.
- SGPT, SGOT, and Bilirubin levels are strong indicators of liver disease
- Liver patients tend to have lower albumin levels than non-liver patients.
Data Preprocessing¶
Remove columns with no relationship to Target column 'Result'
- Age of the patient (correlation ~0)
- A/G Ratio Albumin and Globulin Ratio (correlation ~0.16, relatively weak)
Potential Fixes
- Total Bilirubin and Direct Bilirubin - Highly correlated (~0.89). Drop one of them to avoid multicollinearity
- Sgpt Alanine Aminotransferase and Sgot Aspartate Aminotransferase - Strong correlation (~0.78). Keep one.
- Total Proteins and ALB Albumin - Correlated (~0.78). If needed, keep only one.
- Skewness: Variables like bilirubin and aminotransferases show high skewness. Consider log transformation.
- Categorical Encoding: If "Gender of the patient" is still in the dataset, encode it properly.
data.drop(['Age of the patient', 'Gender of the patient', 'A/G Ratio Albumin and Globulin Ratio', 'Direct Bilirubin', '\xa0Sgpt Alamine Aminotransferase', 'Total Protiens'], axis = 1, inplace = True)
data.head()
| Total Bilirubin | Alkphos Alkaline Phosphotase | Sgot Aspartate Aminotransferase | ALB Albumin | Result | |
|---|---|---|---|---|---|
| 0 | 0.7 | 187.0 | 18.0 | 3.3 | 1 |
| 1 | 10.9 | 699.0 | 100.0 | 3.2 | 1 |
| 2 | 7.3 | 490.0 | 68.0 | 3.3 | 1 |
| 3 | 1.0 | 182.0 | 20.0 | 3.4 | 1 |
| 4 | 3.9 | 195.0 | 59.0 | 2.4 | 1 |
Null Values
data.isnull().sum()
| 0 | |
|---|---|
| Total Bilirubin | 648 |
| Alkphos Alkaline Phosphotase | 796 |
| Sgot Aspartate Aminotransferase | 462 |
| ALB Albumin | 494 |
| Result | 0 |
data.columns.to_list()
['Total Bilirubin', '\xa0Alkphos Alkaline Phosphotase', 'Sgot Aspartate Aminotransferase', '\xa0ALB Albumin', 'Result']
# Get the list of columns with missing values
columns_with_missing = [
'Total Bilirubin',
'\xa0Alkphos Alkaline Phosphotase',
'Sgot Aspartate Aminotransferase',
'\xa0ALB Albumin'
]
# Create a copy of the DataFrame
df_null = data.copy()
# Iterate over each column with missing values
for col in columns_with_missing:
# Select the column with missing values
col_data = df_null[col].copy()
# Create columns with different NaN handling methods
col_bfill = col_data.fillna(method='bfill') # Backward fill
col_ffill = col_data.fillna(method='ffill') # Forward fill
col_interpolation = col_data.interpolate(method='linear') # Linear interpolation
col_mean = col_data.fillna(col_data.mean()) # Fill with mean
col_mode = col_data.fillna(col_data.mode()[0]) # Fill with mode
# Create a list of methods and their corresponding labels
methods = [
("Original", col_data),
("Backward Fill", col_bfill),
("Forward Fill", col_ffill),
("Interpolation", col_interpolation),
("Mean Fill", col_mean),
("Mode Fill", col_mode)
]
# Create subplots: 2 rows, 3 columns with a larger figure size
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(22, 12)) # Increased figsize for better visibility
fig.suptitle(f"Comparison of Missing Data Handling Methods for {col}", fontsize=20)
# Flatten axes for easier indexing
axes = axes.flatten()
# Plot each method in its own subplot
for i, (label, method) in enumerate(methods):
# Plot the method data
axes[i].plot(method, linewidth=2, label=label, color='gray' if label != "Original" else 'blue')
# Highlight missing data positions
if label == "Original":
# Red scatter points for missing data in the original column
axes[i].scatter(method.index[method.isnull()], [0] * method.isnull().sum(),
color='red', label='Missing Data (Original)', zorder=3)
else:
# Red scatter points where filling occurred
missing_indices = col_data.isnull() & ~method.isnull() # Locations where fill happened
axes[i].scatter(missing_indices.index[missing_indices], method[missing_indices],
color='red', label='Filled Data', zorder=3)
# Add titles and labels
axes[i].set_title(label, fontsize=16)
axes[i].set_xlabel('Index', fontsize=14)
axes[i].set_ylabel('Value', fontsize=14)
axes[i].grid(True)
axes[i].legend(fontsize=12)
# Adjust layout to prevent overlap
plt.tight_layout(rect=[0, 0, 1, 0.95]) # Leave space for suptitle
plt.show()
- Interpolation is ideal for fluctuating variables (Alkphos & Sgot) to maintain data consistency.
- Mean/Mode Fill is best for stable variables (Bilirubin & ALB) to preserve overall distribution.
data["Total Bilirubin"].fillna(data["Total Bilirubin"].mean(), inplace=True)
data["\xa0ALB Albumin"].fillna(data["\xa0ALB Albumin"].mode()[0], inplace=True)
data["\xa0Alkphos Alkaline Phosphotase"].interpolate(method="linear", inplace=True)
data["Sgot Aspartate Aminotransferase"].interpolate(method="linear", inplace=True)
data.isnull().sum()
| 0 | |
|---|---|
| Total Bilirubin | 0 |
| Alkphos Alkaline Phosphotase | 0 |
| Sgot Aspartate Aminotransferase | 0 |
| ALB Albumin | 0 |
| Result | 0 |
data.duplicated().sum()
28908
data.head()
| Total Bilirubin | Alkphos Alkaline Phosphotase | Sgot Aspartate Aminotransferase | ALB Albumin | Result | |
|---|---|---|---|---|---|
| 0 | 0.7 | 187.0 | 18.0 | 3.3 | 1 |
| 1 | 10.9 | 699.0 | 100.0 | 3.2 | 1 |
| 2 | 7.3 | 490.0 | 68.0 | 3.3 | 1 |
| 3 | 1.0 | 182.0 | 20.0 | 3.4 | 1 |
| 4 | 3.9 | 195.0 | 59.0 | 2.4 | 1 |
Separate Independand and Dependent Columns
## Separating Independent and Dependent Columns
X = data.drop(['Result'],axis=1)
Y = data[['Result']]
# Import the required function
from sklearn.model_selection import train_test_split
# Splitting the dataset into Training and Testing sets
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42, stratify=Y)
# Convert labels to binary format (1,2 → 0,1) for model compatibility
y_train = (y_train == 2).astype(int)
y_test = (y_test == 2).astype(int)
x_train.head()
| Total Bilirubin | Alkphos Alkaline Phosphotase | Sgot Aspartate Aminotransferase | ALB Albumin | |
|---|---|---|---|---|
| 18983 | 0.7 | 162.0 | 41.0 | 2.5 |
| 8417 | 1.7 | 859.0 | 48.0 | 3.0 |
| 14114 | 6.8 | 542.0 | 66.0 | 3.1 |
| 15253 | 2.2 | 209.0 | 20.0 | 4.0 |
| 14647 | 2.6 | 236.0 | 90.0 | 2.6 |
y_train.head()
| Result | |
|---|---|
| 18983 | 1 |
| 8417 | 0 |
| 14114 | 0 |
| 15253 | 0 |
| 14647 | 0 |
###Checking the shape of train and test sets
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)
(24552, 4) (6139, 4) (24552, 1) (6139, 1)
sns.countplot(data=data, x='Result', edgecolor = 'black');
Y["Result"].value_counts(1)
| proportion | |
|---|---|
| Result | |
| 1 | 0.714118 |
| 2 | 0.285882 |
Model Building¶
Balance Weights
- Class weights were computed earlier to address the imbalance in the dataset, where Class 1 (Non-Liver Patient) was 71.4% and Class 2 (Liver Patient) was 28.6%. Without balancing, the model might favor the majority class, leading to poor performance in detecting liver patients. Assigning higher weight to the minority class ensures the model learns patterns from both classes fairly, improving recall for liver patients. This step is applied in model.fit() to make training more effective and prevent bias toward the dominant class.
from sklearn.utils.class_weight import compute_class_weight
classes = np.unique(Y) # Keep classes as 1 and 2
class_weights = compute_class_weight(class_weight="balanced", classes=classes, y=Y.values.ravel())
class_weight_dict = {classes[i]: class_weights[i] for i in range(len(classes))}
print(class_weight_dict) # Should correctly map weights for {1: weight1, 2: weight2}
{1: 0.7001642560569421, 2: 1.7489742420788694}
backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
Model 1 - "relu"¶
# Import necessary modules
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Initializing the ANN
model1 = Sequential()
# The amount of nodes (dimensions) in hidden layer should be the average of input and output layers, in this case 64.
# This adds the input layer (by specifying input dimension) AND the first hidden layer (units)
model1.add(Dense(64, activation='relu', input_dim=4)) # Change input_dim to 4
# Add 1st hidden layer
model1.add(Dense(32, activation='relu')) # Hidden layer
# Adding the output layer
# Notice that we do not need to specify input dim.
# We have an output of 1 node, which is the desired dimensions of our output (stay with the bank or not)
# We use the sigmoid because we want probability outcomes
model1.add(Dense(1, activation='sigmoid')) # Output layer
# Create optimizer with default learning rate
# Compile the model
model1.compile(optimizer='SGD', loss='binary_crossentropy', metrics=['accuracy'])
model1.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 64) │ 320 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_1 (Dense) │ (None, 32) │ 2,080 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_2 (Dense) │ (None, 1) │ 33 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 2,433 (9.50 KB)
Trainable params: 2,433 (9.50 KB)
Non-trainable params: 0 (0.00 B)
Train the Model
history = model1.fit(
x_train, y_train,
validation_split=0.2,
epochs=50,
batch_size=32,
verbose=1
)
Epoch 1/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7067 - loss: 2.0910 - val_accuracy: 0.7052 - val_loss: 0.5329 Epoch 2/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7163 - loss: 0.5324 - val_accuracy: 0.7052 - val_loss: 0.5312 Epoch 3/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7163 - loss: 0.5308 - val_accuracy: 0.7052 - val_loss: 0.5279 Epoch 4/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7163 - loss: 0.5308 - val_accuracy: 0.7035 - val_loss: 0.5288 Epoch 5/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7159 - loss: 0.5283 - val_accuracy: 0.7029 - val_loss: 0.5266 Epoch 6/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7157 - loss: 0.5276 - val_accuracy: 0.7041 - val_loss: 0.5253 Epoch 7/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7154 - loss: 0.5274 - val_accuracy: 0.7062 - val_loss: 0.5249 Epoch 8/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7148 - loss: 0.5252 - val_accuracy: 0.7009 - val_loss: 0.5230 Epoch 9/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7148 - loss: 0.5248 - val_accuracy: 0.6988 - val_loss: 0.5231 Epoch 10/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7149 - loss: 0.5244 - val_accuracy: 0.7015 - val_loss: 0.5226 Epoch 11/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7151 - loss: 0.5234 - val_accuracy: 0.6988 - val_loss: 0.5216 Epoch 12/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7151 - loss: 0.5230 - val_accuracy: 0.6995 - val_loss: 0.5216 Epoch 13/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7162 - loss: 0.5221 - val_accuracy: 0.7031 - val_loss: 0.5211 Epoch 14/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7161 - loss: 0.5221 - val_accuracy: 0.7076 - val_loss: 0.5202 Epoch 15/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7178 - loss: 0.5212 - val_accuracy: 0.7206 - val_loss: 0.5211 Epoch 16/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7170 - loss: 0.5212 - val_accuracy: 0.7029 - val_loss: 0.5246 Epoch 17/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7156 - loss: 0.5211 - val_accuracy: 0.7019 - val_loss: 0.5236 Epoch 18/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7138 - loss: 0.5223 - val_accuracy: 0.6972 - val_loss: 0.5225 Epoch 19/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7135 - loss: 0.5220 - val_accuracy: 0.7052 - val_loss: 0.5238 Epoch 20/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7151 - loss: 0.5205 - val_accuracy: 0.7013 - val_loss: 0.5231 Epoch 21/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7154 - loss: 0.5194 - val_accuracy: 0.7135 - val_loss: 0.5188 Epoch 22/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 2ms/step - accuracy: 0.7154 - loss: 0.5196 - val_accuracy: 0.6992 - val_loss: 0.5248 Epoch 23/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7110 - loss: 0.5224 - val_accuracy: 0.7127 - val_loss: 0.5182 Epoch 24/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7139 - loss: 0.5188 - val_accuracy: 0.7194 - val_loss: 0.5202 Epoch 25/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7131 - loss: 0.5185 - val_accuracy: 0.7094 - val_loss: 0.5216 Epoch 26/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7124 - loss: 0.5178 - val_accuracy: 0.7231 - val_loss: 0.5173 Epoch 27/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7139 - loss: 0.5174 - val_accuracy: 0.7188 - val_loss: 0.5177 Epoch 28/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7119 - loss: 0.5166 - val_accuracy: 0.7147 - val_loss: 0.5167 Epoch 29/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7130 - loss: 0.5168 - val_accuracy: 0.7259 - val_loss: 0.5184 Epoch 30/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7137 - loss: 0.5163 - val_accuracy: 0.7223 - val_loss: 0.5184 Epoch 31/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7126 - loss: 0.5174 - val_accuracy: 0.7113 - val_loss: 0.5189 Epoch 32/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7138 - loss: 0.5165 - val_accuracy: 0.7196 - val_loss: 0.5165 Epoch 33/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7148 - loss: 0.5156 - val_accuracy: 0.7188 - val_loss: 0.5187 Epoch 34/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7121 - loss: 0.5156 - val_accuracy: 0.7147 - val_loss: 0.5150 Epoch 35/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7139 - loss: 0.5156 - val_accuracy: 0.7109 - val_loss: 0.5174 Epoch 36/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7127 - loss: 0.5152 - val_accuracy: 0.7180 - val_loss: 0.5146 Epoch 37/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7137 - loss: 0.5152 - val_accuracy: 0.7127 - val_loss: 0.5199 Epoch 38/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7123 - loss: 0.5147 - val_accuracy: 0.7157 - val_loss: 0.5138 Epoch 39/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7116 - loss: 0.5160 - val_accuracy: 0.7296 - val_loss: 0.5121 Epoch 40/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7138 - loss: 0.5139 - val_accuracy: 0.7214 - val_loss: 0.5168 Epoch 41/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7148 - loss: 0.5144 - val_accuracy: 0.7245 - val_loss: 0.5095 Epoch 42/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7123 - loss: 0.5142 - val_accuracy: 0.7216 - val_loss: 0.5158 Epoch 43/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7122 - loss: 0.5124 - val_accuracy: 0.7235 - val_loss: 0.5105 Epoch 44/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7128 - loss: 0.5129 - val_accuracy: 0.7192 - val_loss: 0.5100 Epoch 45/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7132 - loss: 0.5126 - val_accuracy: 0.7135 - val_loss: 0.5174 Epoch 46/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7126 - loss: 0.5129 - val_accuracy: 0.7249 - val_loss: 0.5097 Epoch 47/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.7131 - loss: 0.5131 - val_accuracy: 0.7290 - val_loss: 0.5122 Epoch 48/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.7129 - loss: 0.5124 - val_accuracy: 0.7074 - val_loss: 0.5208 Epoch 49/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7112 - loss: 0.5127 - val_accuracy: 0.7188 - val_loss: 0.5138 Epoch 50/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.7145 - loss: 0.5110 - val_accuracy: 0.7188 - val_loss: 0.5161
# Capturing learning history per epoch
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
# Plotting accuracy at different epochs
plt.plot(hist['loss'])
plt.plot(hist['val_loss'])
plt.legend(("train" , "valid") , loc =0)
#Printing results
results = model1.evaluate(x_test, y_test)
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.7020 - loss: 0.5253
- Initial spike: Model starts with high loss (~0.8) but quickly drops.
- Stabilization: Loss flattens near 0.5, suggesting the model is learning but may be stuck.
- No overfitting: Train and validation curves stay close, indicating generalization.
- Potential issue: Loss around 0.5 suggests random guessing. Model may need feature tuning, better architecture, or different loss function.
y_pred=model1.predict(x_test)
y_pred = (y_pred > 0.5) # cut off point (threshold)
y_pred
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
array([[False],
[ True],
[False],
...,
[False],
[False],
[False]])
1 = Non-Liver Patient
2 = Liver Patient
False Positive (FP): Misclassifying a Non-Liver Patient as a Liver Patient
- What happens? A healthy person (non-liver patient) is wrongly classified as having liver disease.
- Risk:
- They might undergo unnecessary medical tests, treatments, or even invasive procedures.
- Increased emotional stress and financial burden due to unnecessary healthcare costs.
- Misallocation of medical resources that could be used for actual liver patients.
False Negative (FN): Misclassifying a Liver Patient as a Non-Liver Patient
- What happens? A person with liver disease is wrongly classified as healthy.
- Risk:
- The most critical risk—delayed diagnosis and treatment.
- Disease could worsen, leading to complications like liver failure or cirrhosis.
- Higher mortality risk if the condition is not treated in time.
False Negative
- worse because they lead to untreated disease, which can become life-threatening
false positives
- should also be minimized to avoid unnecessary medical interventions.
Create custom confusion matrix
def make_confusion_matrix(cf,
group_names=None,
categories='auto',
count=True,
percent=True,
cbar=True,
xyticks=True,
xyplotlabels=True,
sum_stats=True,
figsize=None,
cmap='Blues',
title=None):
'''
This function will make a pretty plot of an sklearn Confusion Matrix cm using a Seaborn heatmap visualization.
Arguments
'''
# CODE TO GENERATE TEXT INSIDE EACH SQUARE
blanks = ['' for i in range(cf.size)]
if group_names and len(group_names)==cf.size:
group_labels = ["{}\n".format(value) for value in group_names]
else:
group_labels = blanks
if count:
group_counts = ["{0:0.0f}\n".format(value) for value in cf.flatten()]
else:
group_counts = blanks
if percent:
group_percentages = ["{0:.2%}".format(value) for value in cf.flatten()/np.sum(cf)]
else:
group_percentages = blanks
box_labels = [f"{v1}{v2}{v3}".strip() for v1, v2, v3 in zip(group_labels,group_counts,group_percentages)]
box_labels = np.asarray(box_labels).reshape(cf.shape[0],cf.shape[1])
# CODE TO GENERATE SUMMARY STATISTICS & TEXT FOR SUMMARY STATS
if sum_stats:
#Accuracy is sum of diagonal divided by total observations
accuracy = np.trace(cf) / float(np.sum(cf))
# SET FIGURE PARAMETERS ACCORDING TO OTHER ARGUMENTS
if figsize==None:
#Get default figure size if not set
figsize = plt.rcParams.get('figure.figsize')
if xyticks==False:
#Do not show categories if xyticks is False
categories=False
# MAKE THE HEATMAP VISUALIZATION
plt.figure(figsize=figsize)
sns.heatmap(cf,annot=box_labels,fmt="",cmap=cmap,cbar=cbar,xticklabels=categories,yticklabels=categories)
if title:
plt.title(title)
# Import necessary libraries
from sklearn.metrics import confusion_matrix, classification_report
from sklearn import metrics
# Calculate the confusion matrix
cm = confusion_matrix(y_test, y_pred)
# Define labels and categories for visualization
labels = ['True Negative', 'False Positive', 'False Negative', 'True Positive']
categories = ['Non-Liver Patient', 'Liver Patient'] # Corrected categories
# Display the confusion matrix
make_confusion_matrix(cm,
group_names=labels,
categories=categories,
cmap='Blues')
# Print the classification report
print("Classification Report:\n")
print(metrics.classification_report(y_test, y_pred))
Classification Report:
precision recall f1-score support
0 0.76 0.87 0.81 4384
1 0.50 0.32 0.39 1755
accuracy 0.71 6139
macro avg 0.63 0.59 0.60 6139
weighted avg 0.69 0.71 0.69 6139
- Accuracy: 71% – The model is moderately accurate.
- Precision & Recall:
- Non-Liver Patient (0): Precision = 76%, Recall = 87% (good at identifying non-liver patients).
- Liver Patient (1): Precision = 50%, Recall = 32% (poor recall, many liver patients misclassified).
- Confusion Matrix:
- True Negatives (TN): 3829 (62.37%) – Correctly classified non-liver patients.
- False Positives (FP): 555 (9.04%) – Misclassified non-liver patients as liver patients.
- False Negatives (FN): 1200 (19.55%) – Misclassified liver patients as non-liver.
- True Positives (TP): 555 (9.04%) – Correctly classified liver patients.
- Key Issue: High false negatives indicate the model struggles to detect liver patients. Consider better class balancing, adjusting thresholds, or improving feature selection.
Model 2 - "sequential"¶
- Create more hidden layers
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
model2 = Sequential()
#Adding the hidden and output layers
model2.add(Dense(256,activation='relu',kernel_initializer='he_uniform',input_dim = x_train.shape[1]))
model2.add(Dense(128,activation='relu',kernel_initializer='he_uniform'))
model2.add(Dense(64,activation='relu',kernel_initializer='he_uniform'))
model2.add(Dense(32,activation='relu',kernel_initializer='he_uniform'))
model2.add(Dense(1, activation = 'sigmoid'))
#Compiling the ANN with Adam optimizer and binary cross entropy loss function
optimizer = tf.keras.optimizers.Adam(0.001)
model2.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
model2.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 256) │ 1,280 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_1 (Dense) │ (None, 128) │ 32,896 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_2 (Dense) │ (None, 64) │ 8,256 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_3 (Dense) │ (None, 32) │ 2,080 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_4 (Dense) │ (None, 1) │ 33 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 44,545 (174.00 KB)
Trainable params: 44,545 (174.00 KB)
Non-trainable params: 0 (0.00 B)
history2 = model2.fit(x_train,y_train,batch_size=64,epochs=50,verbose=1,validation_split = 0.2)
Epoch 1/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.6363 - loss: 5.9120 - val_accuracy: 0.7031 - val_loss: 1.0647 Epoch 2/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.6547 - loss: 1.2863 - val_accuracy: 0.6976 - val_loss: 0.7364 Epoch 3/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.6713 - loss: 0.8041 - val_accuracy: 0.7052 - val_loss: 1.8411 Epoch 4/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6615 - loss: 1.0397 - val_accuracy: 0.7031 - val_loss: 0.9344 Epoch 5/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6823 - loss: 0.6650 - val_accuracy: 0.7113 - val_loss: 0.5449 Epoch 6/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.6900 - loss: 0.6041 - val_accuracy: 0.7031 - val_loss: 0.9151 Epoch 7/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.6886 - loss: 0.6098 - val_accuracy: 0.6836 - val_loss: 0.5215 Epoch 8/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.6864 - loss: 0.6318 - val_accuracy: 0.7052 - val_loss: 0.7875 Epoch 9/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.6976 - loss: 0.5614 - val_accuracy: 0.7049 - val_loss: 0.5829 Epoch 10/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.6881 - loss: 0.5866 - val_accuracy: 0.6577 - val_loss: 0.5664 Epoch 11/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.6982 - loss: 0.5455 - val_accuracy: 0.6789 - val_loss: 0.5450 Epoch 12/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7011 - loss: 0.5340 - val_accuracy: 0.6777 - val_loss: 0.5449 Epoch 13/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7044 - loss: 0.5227 - val_accuracy: 0.6980 - val_loss: 0.5106 Epoch 14/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7075 - loss: 0.5189 - val_accuracy: 0.6703 - val_loss: 0.5467 Epoch 15/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7073 - loss: 0.5164 - val_accuracy: 0.6862 - val_loss: 0.5331 Epoch 16/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7066 - loss: 0.5367 - val_accuracy: 0.7060 - val_loss: 0.5181 Epoch 17/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7060 - loss: 0.5164 - val_accuracy: 0.6549 - val_loss: 0.5606 Epoch 18/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - accuracy: 0.7019 - loss: 0.5178 - val_accuracy: 0.6585 - val_loss: 0.5534 Epoch 19/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7027 - loss: 0.5168 - val_accuracy: 0.6632 - val_loss: 0.5590 Epoch 20/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.7054 - loss: 0.5173 - val_accuracy: 0.6567 - val_loss: 0.5707 Epoch 21/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7071 - loss: 0.5180 - val_accuracy: 0.7013 - val_loss: 0.5296 Epoch 22/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7087 - loss: 0.5123 - val_accuracy: 0.6738 - val_loss: 0.5439 Epoch 23/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7037 - loss: 0.5222 - val_accuracy: 0.7096 - val_loss: 0.5152 Epoch 24/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7115 - loss: 0.5093 - val_accuracy: 0.7084 - val_loss: 0.5080 Epoch 25/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7111 - loss: 0.5086 - val_accuracy: 0.7288 - val_loss: 0.5124 Epoch 26/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7093 - loss: 0.5098 - val_accuracy: 0.7184 - val_loss: 0.5114 Epoch 27/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7181 - loss: 0.5066 - val_accuracy: 0.6874 - val_loss: 0.5382 Epoch 28/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7130 - loss: 0.5111 - val_accuracy: 0.7296 - val_loss: 0.5092 Epoch 29/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7102 - loss: 0.5129 - val_accuracy: 0.6597 - val_loss: 0.5672 Epoch 30/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7082 - loss: 0.5175 - val_accuracy: 0.6221 - val_loss: 0.5863 Epoch 31/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 5s 11ms/step - accuracy: 0.7057 - loss: 0.5220 - val_accuracy: 0.6663 - val_loss: 0.5697 Epoch 32/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 12ms/step - accuracy: 0.7070 - loss: 0.5176 - val_accuracy: 0.7145 - val_loss: 0.5064 Epoch 33/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7186 - loss: 0.5036 - val_accuracy: 0.7157 - val_loss: 0.5028 Epoch 34/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7164 - loss: 0.5030 - val_accuracy: 0.6656 - val_loss: 0.5473 Epoch 35/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7106 - loss: 0.5116 - val_accuracy: 0.7263 - val_loss: 0.5037 Epoch 36/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7149 - loss: 0.5011 - val_accuracy: 0.7088 - val_loss: 0.5094 Epoch 37/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7162 - loss: 0.5016 - val_accuracy: 0.7182 - val_loss: 0.4998 Epoch 38/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.7208 - loss: 0.5002 - val_accuracy: 0.7206 - val_loss: 0.4996 Epoch 39/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7177 - loss: 0.5035 - val_accuracy: 0.7027 - val_loss: 0.5044 Epoch 40/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7182 - loss: 0.5010 - val_accuracy: 0.7068 - val_loss: 0.5021 Epoch 41/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7184 - loss: 0.4996 - val_accuracy: 0.7060 - val_loss: 0.5055 Epoch 42/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7174 - loss: 0.4989 - val_accuracy: 0.7015 - val_loss: 0.5038 Epoch 43/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7179 - loss: 0.4990 - val_accuracy: 0.7031 - val_loss: 0.5055 Epoch 44/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7164 - loss: 0.4996 - val_accuracy: 0.7155 - val_loss: 0.5080 Epoch 45/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7199 - loss: 0.4998 - val_accuracy: 0.7049 - val_loss: 0.4954 Epoch 46/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7173 - loss: 0.4983 - val_accuracy: 0.7157 - val_loss: 0.5067 Epoch 47/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7193 - loss: 0.4978 - val_accuracy: 0.7019 - val_loss: 0.5026 Epoch 48/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7173 - loss: 0.4984 - val_accuracy: 0.7019 - val_loss: 0.5036 Epoch 49/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7176 - loss: 0.4963 - val_accuracy: 0.7119 - val_loss: 0.5001 Epoch 50/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7202 - loss: 0.4955 - val_accuracy: 0.7043 - val_loss: 0.5148
#Plotting Train Loss vs Validation Loss
plt.plot(history2.history['loss'])
plt.plot(history2.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
- The loss decreases rapidly in the initial epochs and stabilizes around 0.5, indicating effective learning.
- Spikes in validation loss at the start suggest fluctuations, but it aligns well with training loss later, implying no severe overfitting.
- The model appears to generalize reasonably well.
ROC (Receiver Operating Characteristic) curve
- evaluates the performance of your classification model by plotting the True Positive Rate (TPR) vs. False Positive Rate (FPR) at various threshold levels.
from sklearn.metrics import roc_curve
from matplotlib import pyplot
# predict probabilities
yhat1 = model2.predict(x_test)
# keep probabilities for the positive outcome only
yhat1 = yhat1[:, 0]
# calculate roc curves
fpr, tpr, thresholds1 = roc_curve(y_test, yhat1)
# calculate the g-mean for each threshold
gmeans1 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans1)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds1[ix], gmeans1[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Chance Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step Best Threshold=0.420847, G-Mean=0.700
- Curve Performance: Model is better than random guessing (diagonal line), but it's not optimal. The curve should be closer to the top-left corner for a better classifier.
- Best Threshold: The black dot indicates the threshold that optimally balances sensitivity and specificity using the G-Mean.
- Improvements Needed: If the curve is not close to 1, consider improving feature selection, balancing the dataset, or tuning the model's hyperparameters.
Tuning the threshold using ROC-AUC
#Predicting the results using best as a threshold
y_pred_e1=model2.predict(x_test)
y_pred_e1 = (y_pred_e1 > thresholds1[ix]) # threshold inputted
y_pred_e1
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
array([[False],
[ True],
[ True],
...,
[ True],
[False],
[False]])
# Import necessary libraries
from sklearn.metrics import confusion_matrix, classification_report
from sklearn import metrics
# Calculate the confusion matrix
cm = confusion_matrix(y_test, y_pred_e1)
# Define labels and categories for visualization
labels = ['True Negative', 'False Positive', 'False Negative', 'True Positive']
categories = ['Non-Liver Patient', 'Liver Patient'] # Corrected categories
# Display the confusion matrix
make_confusion_matrix(cm,
group_names=labels,
categories=categories,
cmap='Blues')
# Print the classification report
print("Classification Report:\n")
print(metrics.classification_report(y_test, y_pred_e1))
Classification Report:
precision recall f1-score support
0 0.89 0.60 0.72 4384
1 0.45 0.82 0.58 1755
accuracy 0.66 6139
macro avg 0.67 0.71 0.65 6139
weighted avg 0.77 0.66 0.68 6139
- False Positives (28.62%): Healthy individuals misclassified as liver patients, leading to unnecessary anxiety and treatment.
- False Negatives (5.21%): Liver patients wrongly classified as healthy, posing serious health risks due to missed diagnosis.
- Recall for Liver Patients (82%): Good at identifying actual cases but low precision (45%) means many false alarms.
Model 3 - Batch Normalization techniqu¶
Normalize the data after each layer & less layers
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
# Import necessary modules
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization
# Initialize the ANN
model3 = Sequential()
# Add input layer with first hidden layer
model3.add(Dense(128, activation='relu', input_dim=x_train.shape[1]))
# Add Batch Normalization
model3.add(BatchNormalization())
# Add more hidden layers
model3.add(Dense(64, activation='relu', kernel_initializer='he_uniform'))
model3.add(BatchNormalization())
model3.add(Dense(32, activation='relu', kernel_initializer='he_uniform'))
# Add output layer
model3.add(Dense(1, activation='sigmoid'))
model3.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 128) │ 640 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization │ (None, 128) │ 512 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_1 (Dense) │ (None, 64) │ 8,256 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ batch_normalization_1 │ (None, 64) │ 256 │ │ (BatchNormalization) │ │ │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_2 (Dense) │ (None, 32) │ 2,080 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_3 (Dense) │ (None, 1) │ 33 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 11,777 (46.00 KB)
Trainable params: 11,393 (44.50 KB)
Non-trainable params: 384 (1.50 KB)
optimizer = tf.keras.optimizers.Adam(0.001)
model3.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
history_3 = model3.fit(x_train,y_train,batch_size=64,epochs=50,verbose=1,validation_split = 0.2)
Epoch 1/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - accuracy: 0.6675 - loss: 0.5824 - val_accuracy: 0.7094 - val_loss: 0.5064 Epoch 2/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7257 - loss: 0.4941 - val_accuracy: 0.7251 - val_loss: 0.4895 Epoch 3/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7312 - loss: 0.4878 - val_accuracy: 0.7147 - val_loss: 0.4834 Epoch 4/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7340 - loss: 0.4820 - val_accuracy: 0.7294 - val_loss: 0.4917 Epoch 5/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7396 - loss: 0.4764 - val_accuracy: 0.7216 - val_loss: 0.4750 Epoch 6/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7439 - loss: 0.4720 - val_accuracy: 0.7168 - val_loss: 0.4944 Epoch 7/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7464 - loss: 0.4662 - val_accuracy: 0.7218 - val_loss: 0.4799 Epoch 8/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7475 - loss: 0.4625 - val_accuracy: 0.7102 - val_loss: 0.4901 Epoch 9/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7502 - loss: 0.4587 - val_accuracy: 0.7656 - val_loss: 0.4495 Epoch 10/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7546 - loss: 0.4526 - val_accuracy: 0.7459 - val_loss: 0.4594 Epoch 11/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7580 - loss: 0.4495 - val_accuracy: 0.7483 - val_loss: 0.4525 Epoch 12/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7637 - loss: 0.4458 - val_accuracy: 0.7662 - val_loss: 0.4376 Epoch 13/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7591 - loss: 0.4488 - val_accuracy: 0.7442 - val_loss: 0.4698 Epoch 14/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7643 - loss: 0.4455 - val_accuracy: 0.7522 - val_loss: 0.4567 Epoch 15/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7634 - loss: 0.4430 - val_accuracy: 0.7550 - val_loss: 0.4420 Epoch 16/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7695 - loss: 0.4376 - val_accuracy: 0.7689 - val_loss: 0.4322 Epoch 17/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7679 - loss: 0.4354 - val_accuracy: 0.7628 - val_loss: 0.4334 Epoch 18/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7737 - loss: 0.4320 - val_accuracy: 0.7679 - val_loss: 0.4209 Epoch 19/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7740 - loss: 0.4296 - val_accuracy: 0.7418 - val_loss: 0.4384 Epoch 20/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7750 - loss: 0.4262 - val_accuracy: 0.7605 - val_loss: 0.4325 Epoch 21/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7781 - loss: 0.4236 - val_accuracy: 0.7838 - val_loss: 0.4217 Epoch 22/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7814 - loss: 0.4202 - val_accuracy: 0.7742 - val_loss: 0.4342 Epoch 23/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7844 - loss: 0.4170 - val_accuracy: 0.7950 - val_loss: 0.4134 Epoch 24/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7866 - loss: 0.4136 - val_accuracy: 0.7950 - val_loss: 0.4126 Epoch 25/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7911 - loss: 0.4102 - val_accuracy: 0.7809 - val_loss: 0.4179 Epoch 26/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7922 - loss: 0.4083 - val_accuracy: 0.7644 - val_loss: 0.4313 Epoch 27/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.7902 - loss: 0.4053 - val_accuracy: 0.7791 - val_loss: 0.4043 Epoch 28/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7921 - loss: 0.4039 - val_accuracy: 0.7723 - val_loss: 0.4156 Epoch 29/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7951 - loss: 0.3972 - val_accuracy: 0.7976 - val_loss: 0.3944 Epoch 30/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7966 - loss: 0.3933 - val_accuracy: 0.7890 - val_loss: 0.3979 Epoch 31/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8040 - loss: 0.3881 - val_accuracy: 0.8021 - val_loss: 0.3959 Epoch 32/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8035 - loss: 0.3848 - val_accuracy: 0.7907 - val_loss: 0.3950 Epoch 33/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8062 - loss: 0.3810 - val_accuracy: 0.7982 - val_loss: 0.4026 Epoch 34/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8094 - loss: 0.3777 - val_accuracy: 0.7848 - val_loss: 0.3993 Epoch 35/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8122 - loss: 0.3729 - val_accuracy: 0.7884 - val_loss: 0.4227 Epoch 36/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8129 - loss: 0.3730 - val_accuracy: 0.8157 - val_loss: 0.3769 Epoch 37/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8133 - loss: 0.3687 - val_accuracy: 0.7970 - val_loss: 0.3980 Epoch 38/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8155 - loss: 0.3662 - val_accuracy: 0.8059 - val_loss: 0.3811 Epoch 39/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.8168 - loss: 0.3638 - val_accuracy: 0.7937 - val_loss: 0.3938 Epoch 40/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8209 - loss: 0.3585 - val_accuracy: 0.8161 - val_loss: 0.3742 Epoch 41/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8170 - loss: 0.3601 - val_accuracy: 0.7899 - val_loss: 0.3968 Epoch 42/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.8224 - loss: 0.3568 - val_accuracy: 0.8206 - val_loss: 0.3842 Epoch 43/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8274 - loss: 0.3492 - val_accuracy: 0.8009 - val_loss: 0.4058 Epoch 44/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8244 - loss: 0.3533 - val_accuracy: 0.8090 - val_loss: 0.3976 Epoch 45/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8255 - loss: 0.3506 - val_accuracy: 0.8104 - val_loss: 0.3868 Epoch 46/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8310 - loss: 0.3461 - val_accuracy: 0.8137 - val_loss: 0.3782 Epoch 47/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step - accuracy: 0.8293 - loss: 0.3442 - val_accuracy: 0.7909 - val_loss: 0.3986 Epoch 48/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.8322 - loss: 0.3417 - val_accuracy: 0.8015 - val_loss: 0.3729 Epoch 49/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8334 - loss: 0.3415 - val_accuracy: 0.7833 - val_loss: 0.4363 Epoch 50/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.8335 - loss: 0.3433 - val_accuracy: 0.8318 - val_loss: 0.3608
#Plotting Train Loss vs Validation Loss
plt.plot(history_3.history['loss'])
plt.plot(history_3.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
- Training loss decreases steadily, indicating the model is learning.
- Validation loss fluctuates but follows a downward trend, suggesting generalization.
- Slight overfitting: Validation loss is higher and more unstable than training loss
Receiver Operating Characteristic (ROC) curve
from sklearn.metrics import roc_curve
from matplotlib import pyplot
# predict probabilities
yhat2 = model3.predict(x_test)
# keep probabilities for the positive outcome only
yhat2 = yhat2[:, 0]
# calculate roc curves
fpr, tpr, thresholds2 = roc_curve(y_test, yhat2)
# calculate the g-mean for each threshold
gmeans2 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans2)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds2[ix], gmeans2[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Change Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step Best Threshold=0.209728, G-Mean=0.797
- The ROC curve shows the trade-off between True Positive Rate (TPR) and False Positive Rate (FPR).
- The model performs better than random guessing (diagonal line), with a G-Mean of 0.797 at the best threshold (~0.21).
- A higher curve indicates better classification ability.
- However, performance may still need improvement if False Positives or False Negatives are critical.
y_pred_e2=model3.predict(x_test)
y_pred_e2 = (y_pred_e2 > thresholds2[ix])
y_pred_e2
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
array([[False],
[ True],
[False],
...,
[ True],
[False],
[False]])
#Calculating the confusion matrix
from sklearn.metrics import confusion_matrix
cm2=confusion_matrix(y_test, y_pred_e2)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Non-Liver Patient','Patient Liver']
make_confusion_matrix(cm2,
group_names=labels,
categories=categories,
cmap='Blues')
#Accuracy as per the classification report
from sklearn import metrics
cr2=metrics.classification_report(y_test,y_pred_e2)
print(cr2)
precision recall f1-score support
0 0.96 0.69 0.80 4384
1 0.54 0.92 0.68 1755
accuracy 0.76 6139
macro avg 0.75 0.81 0.74 6139
weighted avg 0.84 0.76 0.77 6139
- The confusion matrix shows that the model correctly predicts 49.14% (3017) True Negatives and 26.37% (1619) True Positives.
- However, 22.27% (1367) are False Positives, meaning many were incorrectly classified as positive. 2.22% (136) are False Negatives, which is relatively low. If False Positives are costly, improving precision is necessary.
- The classification report shows an overall accuracy of 76%. Class 0 (Non-Liver Patients) has high precision (96%) but lower recall (69%), meaning it identifies most negatives but misses some positives.
- Class 1 (Liver Patients) has moderate precision (54%) but high recall (92%), meaning it catches most positive cases but misclassifies some negatives.
- The model prioritizes recall for Liver Patients, reducing False Negatives, which is crucial in medical applications.
Model 4 - Dropout technique¶
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
# Import necessary modules
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
# Initialize the ANN
model4 = Sequential()
# Add input layer with first hidden layer
model4.add(Dense(256, activation='relu', input_dim=x_train.shape[1]))
model4.add(Dropout(0.2))
# Add more hidden layers with Dropout for regularization
model4.add(Dense(128, activation='relu'))
model4.add(Dropout(0.2))
model4.add(Dense(64, activation='relu'))
model4.add(Dropout(0.2))
model4.add(Dense(32, activation='relu'))
# Add output layer
model4.add(Dense(1, activation='sigmoid'))
model4.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 256) │ 1,280 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout (Dropout) │ (None, 256) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_1 (Dense) │ (None, 128) │ 32,896 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_1 (Dropout) │ (None, 128) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_2 (Dense) │ (None, 64) │ 8,256 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_2 (Dropout) │ (None, 64) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_3 (Dense) │ (None, 32) │ 2,080 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_4 (Dense) │ (None, 1) │ 33 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 44,545 (174.00 KB)
Trainable params: 44,545 (174.00 KB)
Non-trainable params: 0 (0.00 B)
optimizer = tf.keras.optimizers.Adam(0.001)
model4.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
history_4 = model4.fit(x_train,y_train,batch_size=64,epochs=50,verbose=1,validation_split = 0.2)
Epoch 1/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.6440 - loss: 1.7209 - val_accuracy: 0.7052 - val_loss: 0.5486 Epoch 2/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7024 - loss: 0.5628 - val_accuracy: 0.7052 - val_loss: 0.5407 Epoch 3/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.7129 - loss: 0.5472 - val_accuracy: 0.7052 - val_loss: 0.5389 Epoch 4/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7123 - loss: 0.5441 - val_accuracy: 0.7052 - val_loss: 0.5342 Epoch 5/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7133 - loss: 0.5384 - val_accuracy: 0.7052 - val_loss: 0.5341 Epoch 6/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7134 - loss: 0.5345 - val_accuracy: 0.7035 - val_loss: 0.5308 Epoch 7/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7133 - loss: 0.5338 - val_accuracy: 0.7013 - val_loss: 0.5269 Epoch 8/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.7143 - loss: 0.5277 - val_accuracy: 0.7029 - val_loss: 0.5223 Epoch 9/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7158 - loss: 0.5258 - val_accuracy: 0.6982 - val_loss: 0.5208 Epoch 10/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7134 - loss: 0.5221 - val_accuracy: 0.7015 - val_loss: 0.5166 Epoch 11/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7121 - loss: 0.5198 - val_accuracy: 0.7035 - val_loss: 0.5165 Epoch 12/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7155 - loss: 0.5183 - val_accuracy: 0.7015 - val_loss: 0.5142 Epoch 13/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.7140 - loss: 0.5160 - val_accuracy: 0.6990 - val_loss: 0.5152 Epoch 14/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7130 - loss: 0.5171 - val_accuracy: 0.7021 - val_loss: 0.5122 Epoch 15/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7154 - loss: 0.5139 - val_accuracy: 0.7015 - val_loss: 0.5082 Epoch 16/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7150 - loss: 0.5127 - val_accuracy: 0.7013 - val_loss: 0.5127 Epoch 17/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7145 - loss: 0.5075 - val_accuracy: 0.7015 - val_loss: 0.5121 Epoch 18/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7155 - loss: 0.5070 - val_accuracy: 0.7098 - val_loss: 0.5045 Epoch 19/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7153 - loss: 0.5073 - val_accuracy: 0.7070 - val_loss: 0.5032 Epoch 20/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.7157 - loss: 0.5018 - val_accuracy: 0.7100 - val_loss: 0.5042 Epoch 21/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7162 - loss: 0.5033 - val_accuracy: 0.7021 - val_loss: 0.4965 Epoch 22/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step - accuracy: 0.7172 - loss: 0.5038 - val_accuracy: 0.7231 - val_loss: 0.4980 Epoch 23/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7168 - loss: 0.4998 - val_accuracy: 0.7076 - val_loss: 0.4982 Epoch 24/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7189 - loss: 0.5001 - val_accuracy: 0.7019 - val_loss: 0.4945 Epoch 25/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7175 - loss: 0.4986 - val_accuracy: 0.7035 - val_loss: 0.4936 Epoch 26/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7196 - loss: 0.5008 - val_accuracy: 0.7263 - val_loss: 0.4963 Epoch 27/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7234 - loss: 0.4973 - val_accuracy: 0.7347 - val_loss: 0.4896 Epoch 28/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7256 - loss: 0.4975 - val_accuracy: 0.7243 - val_loss: 0.4898 Epoch 29/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.7298 - loss: 0.4924 - val_accuracy: 0.7276 - val_loss: 0.4898 Epoch 30/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7270 - loss: 0.4941 - val_accuracy: 0.7345 - val_loss: 0.4932 Epoch 31/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7310 - loss: 0.4942 - val_accuracy: 0.7365 - val_loss: 0.4874 Epoch 32/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.7337 - loss: 0.4912 - val_accuracy: 0.7418 - val_loss: 0.4853 Epoch 33/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7326 - loss: 0.4901 - val_accuracy: 0.7363 - val_loss: 0.4871 Epoch 34/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7359 - loss: 0.4886 - val_accuracy: 0.7322 - val_loss: 0.4844 Epoch 35/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7321 - loss: 0.4887 - val_accuracy: 0.7359 - val_loss: 0.4883 Epoch 36/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7364 - loss: 0.4870 - val_accuracy: 0.7355 - val_loss: 0.4775 Epoch 37/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7403 - loss: 0.4830 - val_accuracy: 0.7379 - val_loss: 0.4769 Epoch 38/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7398 - loss: 0.4833 - val_accuracy: 0.7445 - val_loss: 0.4746 Epoch 39/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.7422 - loss: 0.4802 - val_accuracy: 0.7434 - val_loss: 0.4794 Epoch 40/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7380 - loss: 0.4831 - val_accuracy: 0.7489 - val_loss: 0.4739 Epoch 41/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7389 - loss: 0.4819 - val_accuracy: 0.7398 - val_loss: 0.4760 Epoch 42/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7418 - loss: 0.4813 - val_accuracy: 0.7314 - val_loss: 0.4774 Epoch 43/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7392 - loss: 0.4788 - val_accuracy: 0.7508 - val_loss: 0.4738 Epoch 44/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7449 - loss: 0.4803 - val_accuracy: 0.7406 - val_loss: 0.4811 Epoch 45/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7443 - loss: 0.4809 - val_accuracy: 0.7485 - val_loss: 0.4681 Epoch 46/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7431 - loss: 0.4772 - val_accuracy: 0.7445 - val_loss: 0.4681 Epoch 47/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.7468 - loss: 0.4739 - val_accuracy: 0.7451 - val_loss: 0.4765 Epoch 48/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7436 - loss: 0.4801 - val_accuracy: 0.7440 - val_loss: 0.4649 Epoch 49/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step - accuracy: 0.7485 - loss: 0.4709 - val_accuracy: 0.7455 - val_loss: 0.4727 Epoch 50/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7489 - loss: 0.4741 - val_accuracy: 0.7526 - val_loss: 0.4599
#Plotting Train Loss vs Validation Loss
plt.plot(history_4.history['loss'])
plt.plot(history_4.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
- The loss curve shows a sharp initial drop, stabilizing after a few epochs. - Training and validation losses decrease consistently, staying close, indicating good generalization with no severe overfitting.
- The model continues learning, but improvements slow after ~10 epochs.
from sklearn.metrics import roc_curve
from matplotlib import pyplot
# predict probabilities
yhat3 = model4.predict(x_test)
# keep probabilities for the positive outcome only
yhat3 = yhat3[:, 0]
# calculate roc curves
fpr, tpr, thresholds3 = roc_curve(y_test, yhat3)
# calculate the g-mean for each threshold
gmeans3 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans3)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds3[ix], gmeans3[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Chance Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step Best Threshold=0.345194, G-Mean=0.728
- The ROC curve indicates moderate model performance with a G-Mean of 0.728, balancing sensitivity and specificity.
- The curve rises above the no-skill line, showing predictive power, but there's room for improvement.
- The best threshold (0.345) suggests an optimal trade-off between false positives and false negatives.
y_pred_e3=model4.predict(x_test)
y_pred_e3 = (y_pred_e3 > thresholds3[ix])
y_pred_e3
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
array([[False],
[ True],
[ True],
...,
[ True],
[False],
[False]])
#Calculating the confusion matrix
from sklearn.metrics import confusion_matrix
cm3=confusion_matrix(y_test, y_pred_e3)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Non-Liver Patient','Liver Patient']
make_confusion_matrix(cm3,
group_names=labels,
categories=categories,
cmap='Blues')
#Accuracy as per the classification report
from sklearn import metrics
cr3=metrics.classification_report(y_test,y_pred_e3)
print(cr3)
precision recall f1-score support
0 0.91 0.63 0.74 4384
1 0.48 0.84 0.61 1755
accuracy 0.69 6139
macro avg 0.69 0.74 0.68 6139
weighted avg 0.78 0.69 0.71 6139
- True Negative (45.12%): Non-liver patients correctly identified.
- False Positive (26.29%): Non-liver patients misclassified as liver patients, leading to unnecessary concern or treatment.
- False Negative (4.63%): Liver patients misclassified as non-liver patients, posing a serious risk of missed diagnosis.
- True Positive (23.96%): Liver patients correctly identified.
- Recall for Class 1 (Liver Patient) is 0.84, meaning your model captures 84% of actual liver patients, reducing missed diagnoses.
- Precision for Class 1 is 0.48, indicating many false positives.
- Overall Accuracy: 69%, suggesting room for improvement.
- False Negatives are low (4.63%), which is positive for healthcare applications where missing true cases is critical.
Model 5 - Random Search CV¶
- Hyperparameters
- Type of Architecture
- Number of Layers
- Number of Neurons in a layer
- Regularization hyperparameters
- Learning Rate
- Type of Optimizer
- Dropout Rate
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
def create_model_v4(input_dim):
np.random.seed(1337)
model5 = Sequential()
model5.add(Dense(256, activation='relu', input_dim=input_dim))
model5.add(Dropout(0.3))
model5.add(Dense(128, activation='relu'))
model5.add(Dense(64, activation='relu'))
model5.add(Dense(32, activation='relu'))
model5.add(Dense(1, activation='sigmoid'))
optimizer = tf.keras.optimizers.Adam()
model5.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
return model5
# Import necessary modules
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import RandomizedSearchCV
# Define Keras estimator
keras_estimator = KerasClassifier(build_fn=create_model_v4, input_dim=x_train.shape[1], optimizer="Adam", verbose=1)
# Define the grid search parameters
learn_rate = [0.01, 0.1, 0.001]
batch_size = [32, 64, 128]
param_random = dict(optimizer__learning_rate=learn_rate, batch_size=batch_size)
kfold_splits = 3
random = RandomizedSearchCV(estimator=keras_estimator,
verbose=1,
cv=kfold_splits,
param_distributions=param_random,
n_jobs=-1)
random_result = random.fit(x_train, y_train,validation_split=0.2,verbose=1)
# Summarize results
print("Best: %f using %s" % (random_result.best_score_, random_result.best_params_))
means = random_result.cv_results_['mean_test_score']
stds = random_result.cv_results_['std_test_score']
params = random_result.cv_results_['params']
Fitting 3 folds for each of 9 candidates, totalling 27 fits 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6692 - loss: 1.2555 - val_accuracy: 0.7255 - val_loss: 0.5643 Best: 0.714117 using {'optimizer__learning_rate': 0.1, 'batch_size': 32}
estimator_v4 = create_model_v4(input_dim=x_train.shape[1]) # Pass input_dim explicitly
estimator_v4.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ dense_5 (Dense) │ (None, 256) │ 1,280 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_1 (Dropout) │ (None, 256) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_6 (Dense) │ (None, 128) │ 32,896 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_7 (Dense) │ (None, 64) │ 8,256 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_8 (Dense) │ (None, 32) │ 2,080 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_9 (Dense) │ (None, 1) │ 33 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 44,545 (174.00 KB)
Trainable params: 44,545 (174.00 KB)
Non-trainable params: 0 (0.00 B)
optimizer = tf.keras.optimizers.Adam()
estimator_v4.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
history_5 = estimator_v4.fit(x_train, y_train, epochs=50, batch_size = 32, verbose=1,validation_split=0.2)
Epoch 1/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.6977 - loss: 0.6653 - val_accuracy: 0.7035 - val_loss: 0.5430 Epoch 2/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7171 - loss: 0.5324 - val_accuracy: 0.7223 - val_loss: 0.5267 Epoch 3/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7142 - loss: 0.5237 - val_accuracy: 0.7062 - val_loss: 0.5234 Epoch 4/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7133 - loss: 0.5147 - val_accuracy: 0.7221 - val_loss: 0.5162 Epoch 5/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7119 - loss: 0.5114 - val_accuracy: 0.7147 - val_loss: 0.5102 Epoch 6/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7151 - loss: 0.5072 - val_accuracy: 0.7119 - val_loss: 0.5063 Epoch 7/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - accuracy: 0.7214 - loss: 0.5039 - val_accuracy: 0.7151 - val_loss: 0.5028 Epoch 8/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7224 - loss: 0.5024 - val_accuracy: 0.7235 - val_loss: 0.5034 Epoch 9/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.7245 - loss: 0.5005 - val_accuracy: 0.7278 - val_loss: 0.4983 Epoch 10/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.7280 - loss: 0.4983 - val_accuracy: 0.7328 - val_loss: 0.5001 Epoch 11/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7252 - loss: 0.4970 - val_accuracy: 0.7306 - val_loss: 0.4943 Epoch 12/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7278 - loss: 0.4955 - val_accuracy: 0.7286 - val_loss: 0.4960 Epoch 13/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7290 - loss: 0.4981 - val_accuracy: 0.7278 - val_loss: 0.4966 Epoch 14/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7299 - loss: 0.4950 - val_accuracy: 0.7308 - val_loss: 0.4940 Epoch 15/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.7347 - loss: 0.4968 - val_accuracy: 0.7259 - val_loss: 0.4998 Epoch 16/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7372 - loss: 0.4937 - val_accuracy: 0.7432 - val_loss: 0.4926 Epoch 17/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7339 - loss: 0.4956 - val_accuracy: 0.7349 - val_loss: 0.4840 Epoch 18/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7341 - loss: 0.4909 - val_accuracy: 0.7381 - val_loss: 0.4838 Epoch 19/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7382 - loss: 0.4857 - val_accuracy: 0.7296 - val_loss: 0.4880 Epoch 20/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.7355 - loss: 0.4883 - val_accuracy: 0.7333 - val_loss: 0.4846 Epoch 21/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7361 - loss: 0.4871 - val_accuracy: 0.7330 - val_loss: 0.4817 Epoch 22/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7439 - loss: 0.4843 - val_accuracy: 0.7267 - val_loss: 0.4827 Epoch 23/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7407 - loss: 0.4831 - val_accuracy: 0.7430 - val_loss: 0.4801 Epoch 24/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7403 - loss: 0.4849 - val_accuracy: 0.7371 - val_loss: 0.4799 Epoch 25/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.7435 - loss: 0.4834 - val_accuracy: 0.7282 - val_loss: 0.4782 Epoch 26/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7428 - loss: 0.4816 - val_accuracy: 0.7337 - val_loss: 0.4859 Epoch 27/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7439 - loss: 0.4779 - val_accuracy: 0.7512 - val_loss: 0.4720 Epoch 28/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7461 - loss: 0.4783 - val_accuracy: 0.7206 - val_loss: 0.4742 Epoch 29/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.7412 - loss: 0.4799 - val_accuracy: 0.7442 - val_loss: 0.4737 Epoch 30/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7497 - loss: 0.4767 - val_accuracy: 0.7465 - val_loss: 0.4789 Epoch 31/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7409 - loss: 0.4807 - val_accuracy: 0.7333 - val_loss: 0.4774 Epoch 32/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7468 - loss: 0.4759 - val_accuracy: 0.7379 - val_loss: 0.4639 Epoch 33/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7471 - loss: 0.4745 - val_accuracy: 0.7467 - val_loss: 0.4662 Epoch 34/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 3ms/step - accuracy: 0.7465 - loss: 0.4747 - val_accuracy: 0.7408 - val_loss: 0.4762 Epoch 35/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7483 - loss: 0.4741 - val_accuracy: 0.7561 - val_loss: 0.4601 Epoch 36/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7502 - loss: 0.4708 - val_accuracy: 0.7420 - val_loss: 0.4721 Epoch 37/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7449 - loss: 0.4760 - val_accuracy: 0.7497 - val_loss: 0.4637 Epoch 38/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 5s 3ms/step - accuracy: 0.7471 - loss: 0.4718 - val_accuracy: 0.7483 - val_loss: 0.4597 Epoch 39/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7538 - loss: 0.4684 - val_accuracy: 0.7557 - val_loss: 0.4611 Epoch 40/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7510 - loss: 0.4691 - val_accuracy: 0.7499 - val_loss: 0.4529 Epoch 41/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7497 - loss: 0.4692 - val_accuracy: 0.7491 - val_loss: 0.4628 Epoch 42/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.7539 - loss: 0.4678 - val_accuracy: 0.7550 - val_loss: 0.4490 Epoch 43/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7481 - loss: 0.4665 - val_accuracy: 0.7597 - val_loss: 0.4488 Epoch 44/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.7538 - loss: 0.4617 - val_accuracy: 0.7516 - val_loss: 0.4496 Epoch 45/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7572 - loss: 0.4587 - val_accuracy: 0.7402 - val_loss: 0.4526 Epoch 46/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7530 - loss: 0.4604 - val_accuracy: 0.7585 - val_loss: 0.4442 Epoch 47/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7551 - loss: 0.4602 - val_accuracy: 0.7483 - val_loss: 0.4472 Epoch 48/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7483 - loss: 0.4627 - val_accuracy: 0.7557 - val_loss: 0.4465 Epoch 49/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 3ms/step - accuracy: 0.7530 - loss: 0.4599 - val_accuracy: 0.7567 - val_loss: 0.4458 Epoch 50/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7560 - loss: 0.4562 - val_accuracy: 0.7550 - val_loss: 0.4411
#Plotting Train Loss vs Validation Loss
plt.plot(history_5.history['loss'])
plt.plot(history_5.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
- The model loss graph shows a steady decline in both training and validation loss over epochs, indicating effective learning.
- The validation loss remains close to the training loss, suggesting minimal overfitting.
- The downward trend implies the model is improving, but further tuning might reduce the gap for better generalization.
from sklearn.metrics import roc_curve
from matplotlib import pyplot
# predict probabilities
yhat4 = estimator_v4.predict(x_test)
# keep probabilities for the positive outcome only
yhat4 = yhat4[:, 0]
# calculate roc curves
fpr, tpr, thresholds4 = roc_curve(y_test, yhat4)
# calculate the g-mean for each threshold
gmeans4 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans4)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds4[ix], gmeans4[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Chance Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step Best Threshold=0.313894, G-Mean=0.735
- The curve is above the diagonal "No Skill" line, indicating predictive power.
- The best threshold (0.325) balances sensitivity and specificity, with a G-Mean of 0.748, suggesting reasonable but improvable performance.
y_pred_e4=estimator_v4.predict(x_test)
y_pred_e4 = (y_pred_e4 > thresholds4[ix])
y_pred_e4
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
array([[False],
[ True],
[False],
...,
[ True],
[False],
[False]])
#Calculating the confusion matrix
from sklearn.metrics import confusion_matrix
cm4=confusion_matrix(y_test, y_pred_e4)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Non-Liver Patient','Liver Patient']
make_confusion_matrix(cm4,
group_names=labels,
categories=categories,
cmap='Blues')
#Accuracy as per the classification report
from sklearn import metrics
cr4=metrics.classification_report(y_test,y_pred_e4)
print(cr4)
precision recall f1-score support
0 0.92 0.63 0.74 4384
1 0.48 0.86 0.61 1755
accuracy 0.69 6139
macro avg 0.70 0.74 0.68 6139
weighted avg 0.79 0.69 0.71 6139
- The confusion matrix and classification report show an overall accuracy of 72%, with a better balance between recall for both classes.
- True Negative (48.05%): Correctly identified non-liver patients.
- True Positive (23.73%): Correctly identified liver patients.
- False Positive (23.36%): Non-liver patients misclassified as liver patients.
- False Negative (4.85%): Liver patients misclassified as non-liver patients.
- While recall for liver patients (83%) has improved, precision (50%) remains low, indicating a high number of false positives. This suggests the model favors detecting liver disease at the cost of some misclassifications, which may be acceptable in medical diagnosis to ensure fewer actual liver patients are missed.
Model 6 - Grid Search CV¶
- Parameters
- Type of Architecture
- Number of Layers
- Number of Neurons in a layer
- Regularization hyperparameters
- Learning Rate
- Type of Optimizer
- Dropout Rate
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
def create_model_v5():
np.random.seed(1337)
model6 = Sequential()
model6.add(Dense(256,activation='relu',input_dim = x_train.shape[1]))
model6.add(Dropout(0.3))
#model6.add(Dense(128,activation='relu',kernel_initializer='he_uniform'))
model6.add(Dense(128,activation='relu'))
model6.add(Dropout(0.3))
model6.add(Dense(64,activation='relu'))
model6.add(Dropout(0.2))
#model6.add(Dense(32,activation='relu',kernel_initializer='he_uniform'))
#model6.add(Dropout(0.3))
model6.add(Dense(32,activation='relu'))
model6.add(Dense(1, activation='sigmoid'))
#compile model
optimizer = tf.keras.optimizers.Adam()
model6.compile(optimizer = optimizer,loss = 'binary_crossentropy', metrics = ['accuracy'])
return model6
# Import necessary modules
from sklearn.model_selection import GridSearchCV
from scikeras.wrappers import KerasClassifier
# Define Keras estimator
keras_estimator = KerasClassifier(build_fn=create_model_v4, input_dim=x_train.shape[1], optimizer="Adam", verbose=1)
# Define the grid search parameters
learn_rate = [0.01, 0.1, 0.001]
batch_size = [32, 64, 128]
param_grid = dict(optimizer__learning_rate=learn_rate, batch_size=batch_size)
kfold_splits = 3
grid = GridSearchCV(estimator=keras_estimator,
verbose=1,
cv=kfold_splits,
param_grid=param_grid,
n_jobs=-1)
import time
# store starting time
begin = time.time()
grid_result = grid.fit(x_train, y_train, verbose=1)
# Summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
time.sleep(1)
# store end time
end = time.time()
# total time taken
print(f"Total runtime of the program is {end - begin}")
Fitting 3 folds for each of 9 candidates, totalling 27 fits 192/192 ━━━━━━━━━━━━━━━━━━━━ 4s 7ms/step - accuracy: 0.6627 - loss: 2.0261 Best: 0.716153 using {'batch_size': 128, 'optimizer__learning_rate': 0.01} Total runtime of the program is 113.1963906288147
estimator_v5=create_model_v5()
estimator_v5.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ dense_5 (Dense) │ (None, 256) │ 1,280 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_1 (Dropout) │ (None, 256) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_6 (Dense) │ (None, 128) │ 32,896 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_2 (Dropout) │ (None, 128) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_7 (Dense) │ (None, 64) │ 8,256 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_3 (Dropout) │ (None, 64) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_8 (Dense) │ (None, 32) │ 2,080 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_9 (Dense) │ (None, 1) │ 33 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 44,545 (174.00 KB)
Trainable params: 44,545 (174.00 KB)
Non-trainable params: 0 (0.00 B)
optimizer = tf.keras.optimizers.Adam(grid_result.best_params_['optimizer__learning_rate'])
estimator_v5.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
history_6=estimator_v5.fit(x_train, y_train, epochs=50, batch_size = 32, verbose=1,validation_split=0.2)
Epoch 1/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6876 - loss: 1.1544 - val_accuracy: 0.7052 - val_loss: 0.5272 Epoch 2/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7147 - loss: 0.5382 - val_accuracy: 0.7052 - val_loss: 0.5469 Epoch 3/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5338 - val_accuracy: 0.7052 - val_loss: 0.5363 Epoch 4/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7160 - loss: 0.5345 - val_accuracy: 0.7052 - val_loss: 0.5368 Epoch 5/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7163 - loss: 0.5301 - val_accuracy: 0.7052 - val_loss: 0.5370 Epoch 6/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5345 - val_accuracy: 0.7052 - val_loss: 0.5499 Epoch 7/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5327 - val_accuracy: 0.7052 - val_loss: 0.5558 Epoch 8/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5356 - val_accuracy: 0.7052 - val_loss: 0.5462 Epoch 9/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7161 - loss: 0.5379 - val_accuracy: 0.7052 - val_loss: 0.5586 Epoch 10/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7161 - loss: 0.5653 - val_accuracy: 0.7052 - val_loss: 0.5620 Epoch 11/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5490 - val_accuracy: 0.7052 - val_loss: 0.5552 Epoch 12/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5495 - val_accuracy: 0.7052 - val_loss: 0.5292 Epoch 13/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7163 - loss: 0.5357 - val_accuracy: 0.7052 - val_loss: 0.5268 Epoch 14/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7156 - loss: 0.5379 - val_accuracy: 0.7052 - val_loss: 0.5308 Epoch 15/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7160 - loss: 0.5440 - val_accuracy: 0.7052 - val_loss: 0.5300 Epoch 16/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5350 - val_accuracy: 0.7052 - val_loss: 0.5164 Epoch 17/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5387 - val_accuracy: 0.7052 - val_loss: 0.5182 Epoch 18/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.7161 - loss: 0.5352 - val_accuracy: 0.7052 - val_loss: 0.5480 Epoch 19/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7161 - loss: 0.5437 - val_accuracy: 0.7052 - val_loss: 0.5279 Epoch 20/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.7162 - loss: 0.5358 - val_accuracy: 0.7052 - val_loss: 0.5642 Epoch 21/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5567 - val_accuracy: 0.7052 - val_loss: 0.5521 Epoch 22/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7161 - loss: 0.5640 - val_accuracy: 0.7052 - val_loss: 0.5535 Epoch 23/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7162 - loss: 0.5538 - val_accuracy: 0.7052 - val_loss: 0.5551 Epoch 24/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7162 - loss: 0.5667 - val_accuracy: 0.7052 - val_loss: 0.5678 Epoch 25/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5676 - val_accuracy: 0.7052 - val_loss: 0.5571 Epoch 26/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5822 - val_accuracy: 0.7052 - val_loss: 0.5763 Epoch 27/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7162 - loss: 0.5697 - val_accuracy: 0.7052 - val_loss: 0.5969 Epoch 28/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5856 - val_accuracy: 0.7052 - val_loss: 0.5822 Epoch 29/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5815 - val_accuracy: 0.7052 - val_loss: 0.5603 Epoch 30/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5753 - val_accuracy: 0.7052 - val_loss: 0.5835 Epoch 31/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5774 - val_accuracy: 0.7052 - val_loss: 0.5989 Epoch 32/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7162 - loss: 0.5914 - val_accuracy: 0.7052 - val_loss: 0.5998 Epoch 33/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5875 - val_accuracy: 0.7052 - val_loss: 0.5933 Epoch 34/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5810 - val_accuracy: 0.7052 - val_loss: 0.5982 Epoch 35/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5861 - val_accuracy: 0.7052 - val_loss: 0.5906 Epoch 36/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5832 - val_accuracy: 0.7052 - val_loss: 0.5923 Epoch 37/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5846 - val_accuracy: 0.7052 - val_loss: 0.6019 Epoch 38/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - accuracy: 0.7162 - loss: 0.5910 - val_accuracy: 0.7052 - val_loss: 0.6019 Epoch 39/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7162 - loss: 0.5910 - val_accuracy: 0.7052 - val_loss: 0.6019 Epoch 40/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5909 - val_accuracy: 0.7052 - val_loss: 0.6019 Epoch 41/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5908 - val_accuracy: 0.7052 - val_loss: 0.6019 Epoch 42/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7162 - loss: 0.5906 - val_accuracy: 0.7052 - val_loss: 0.6020 Epoch 43/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7162 - loss: 0.5907 - val_accuracy: 0.7052 - val_loss: 0.6019 Epoch 44/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5906 - val_accuracy: 0.7052 - val_loss: 0.6020 Epoch 45/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5909 - val_accuracy: 0.7052 - val_loss: 0.6020 Epoch 46/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.6032 - val_accuracy: 0.7052 - val_loss: 0.6053 Epoch 47/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.7162 - loss: 0.5921 - val_accuracy: 0.7052 - val_loss: 0.6032 Epoch 48/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - accuracy: 0.7162 - loss: 0.5915 - val_accuracy: 0.7052 - val_loss: 0.6031 Epoch 49/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5916 - val_accuracy: 0.7052 - val_loss: 0.6030 Epoch 50/50 614/614 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7162 - loss: 0.5915 - val_accuracy: 0.7052 - val_loss: 0.6030
#Plotting Train Loss vs Validation Loss
plt.plot(history_6.history['loss'])
plt.plot(history_6.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
- The training and validation loss decrease steadily, indicating effective learning.
- Both curves closely follow each other, suggesting minimal overfitting. The model stabilizes after ~10 epochs, with a final loss around 0.48, implying decent but improvable performance.
from sklearn.metrics import roc_curve
from matplotlib import pyplot
# predict probabilities
yhat5 = estimator_v5.predict(x_test)
# keep probabilities for the positive outcome only
yhat5 = yhat5[:, 0]
# calculate roc curves
fpr, tpr, thresholds5 = roc_curve(y_test, yhat5)
# calculate the g-mean for each threshold
gmeans5 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans5)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds5[ix], gmeans5[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Chance Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step Best Threshold=0.280138, G-Mean=0.160
- Poor model performance: The ROC curve closely follows the diagonal, indicating near-random classification.
- Low G-Mean (0.160): Suggests an imbalanced sensitivity and specificity.
- Threshold ineffective: The best threshold does not improve class separation significantly.
- Needs improvement: Consider hyperparameter tuning and addressing class imbalance.
y_pred_e5=estimator_v5.predict(x_test)
y_pred_e5 = (y_pred_e5 > thresholds5[ix])
y_pred_e5
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step
array([[False],
[False],
[False],
...,
[ True],
[ True],
[ True]])
#Calculating the confusion matrix
from sklearn.metrics import confusion_matrix
cm5=confusion_matrix(y_test, y_pred_e5)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Non-Liver Patient','Liver Patient']
make_confusion_matrix(cm5,
group_names=labels,
categories=categories,
cmap='Blues')
#Accuracy as per the classification report
from sklearn import metrics
cr5=metrics.classification_report(y_test,y_pred_e5)
print(cr5)
precision recall f1-score support
0 0.71 1.00 0.83 4384
1 0.00 0.00 0.00 1755
accuracy 0.71 6139
macro avg 0.36 0.50 0.42 6139
weighted avg 0.51 0.71 0.59 6139
- Severe class imbalance in predictions: The model predicts nearly all cases as "Non-Liver Patient," failing to identify any "Liver Patients."
- Poor recall & precision for class 1: The recall and precision for the "Liver Patient" class are both 0.00, meaning the model does not detect any actual positive cases.
- High accuracy but misleading: The model achieves 71% accuracy, but this is due to predicting the majority class (Non-Liver Patient) rather than true predictive power.
- Macro & weighted averages are low: The macro average F1-score is 0.42, indicating poor performance in distinguishing between classes.
- Urgent need for resampling: Consider oversampling (e.g., SMOTE) or rebalancing the dataset to improve class 1 predictions.
Dask¶
- There is also another library called Dask, sometimes used in the industry to provide a performance boost to Hyperparameter Tuning due to its parallelized computing procedure.
- Dask also has the option of implementing Grid Search similar to the Grid Search in Scikit-learn.
# pip install dask==2024.12.1 dask-ml scikit-learn==1.2.2
import dask_ml
import dask
import sklearn
print(dask.__version__)
print(dask_ml.__version__)
print(sklearn.__version__)
2024.12.1 2024.4.4 1.4.2
# importing library
from dask_ml.model_selection import GridSearchCV as DaskGridSearchCV
def create_model_v6():
np.random.seed(1337)
model7 = Sequential()
model7.add(Dense(256,activation='relu',input_dim = x_train.shape[1]))
model7.add(Dropout(0.3))
#model.add(Dense(128,activation='relu',kernel_initializer='he_uniform'))
model7.add(Dense(128,activation='relu'))
model7.add(Dropout(0.3))
model7.add(Dense(64,activation='relu'))
model7.add(Dropout(0.2))
#model7.add(Dense(32,activation='relu',kernel_initializer='he_uniform'))
#model7.add(Dropout(0.3))
model7.add(Dense(32,activation='relu'))
model7.add(Dense(1, activation='sigmoid'))
#compile model
optimizer = tf.keras.optimizers.Adam()
model7.compile(optimizer = optimizer,loss = 'binary_crossentropy', metrics = ['accuracy'])
return model7
keras_estimator = KerasClassifier(build_fn=create_model_v6, verbose=1)
# define the grid search parameters
learn_rate = [0.01, 0.1, 0.001]
batch_size = [32, 64, 128]
param_grid = dict(optimizer__learning_rate=learn_rate, batch_size=batch_size)
kfold_splits = 3
dask = DaskGridSearchCV(estimator=keras_estimator,
cv=kfold_splits,
param_grid=param_grid,n_jobs=-1)
import time
# store starting time
begin = time.time()
dask_result = dask.fit(x_train, y_train,validation_split=0.2,verbose=1)
# Summarize results
print("Best: %f using %s" % (dask_result.best_score_, dask_result.best_params_))
means = dask_result.cv_results_['mean_test_score']
stds = dask_result.cv_results_['std_test_score']
params = dask_result.cv_results_['params']
time.sleep(1)
# store end time
end = time.time()
# total time taken
print(f"Total runtime of the program is {end - begin}")
103/103 ━━━━━━━━━━━━━━━━━━━━ 7s 15ms/step - accuracy: 0.6442 - loss: 2.1003 - val_accuracy: 0.7086 - val_loss: 0.5592 103/103 ━━━━━━━━━━━━━━━━━━━━ 7s 14ms/step - accuracy: 0.6472 - loss: 1.8261 - val_accuracy: 0.7065 - val_loss: 0.5527 64/64 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step 64/64 ━━━━━━━━━━━━━━━━━━━━ 1s 7ms/step 410/410 ━━━━━━━━━━━━━━━━━━━━ 8s 13ms/step - accuracy: 0.6611 - loss: 1.3498 - val_accuracy: 0.7086 - val_loss: 0.5541 410/410 ━━━━━━━━━━━━━━━━━━━━ 7s 7ms/step - accuracy: 0.6608 - loss: 1.5886 - val_accuracy: 0.7086 - val_loss: 0.5540 256/256 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step 256/256 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step 410/410 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - accuracy: 0.6337 - loss: 2.7663 - val_accuracy: 0.6970 - val_loss: 0.5926 256/256 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step 103/103 ━━━━━━━━━━━━━━━━━━━━ 5s 20ms/step - accuracy: 0.6384 - loss: 2.2203 - val_accuracy: 0.6973 - val_loss: 0.5958 64/64 ━━━━━━━━━━━━━━━━━━━━ 1s 7ms/step 205/205 ━━━━━━━━━━━━━━━━━━━━ 10s 15ms/step - accuracy: 0.6478 - loss: 1.5850 - val_accuracy: 0.7083 - val_loss: 0.5532 128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step 205/205 ━━━━━━━━━━━━━━━━━━━━ 8s 13ms/step - accuracy: 0.6542 - loss: 1.7590 - val_accuracy: 0.7083 - val_loss: 0.5894 205/205 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.6449 - loss: 1.6088 - val_accuracy: 0.7083 - val_loss: 0.5647 128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step 128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step 103/103 ━━━━━━━━━━━━━━━━━━━━ 7s 15ms/step - accuracy: 0.6386 - loss: 2.6754 - val_accuracy: 0.7086 - val_loss: 0.5698 103/103 ━━━━━━━━━━━━━━━━━━━━ 6s 11ms/step - accuracy: 0.6544 - loss: 1.7676 - val_accuracy: 0.7083 - val_loss: 0.5713 64/64 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step 64/64 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step 410/410 ━━━━━━━━━━━━━━━━━━━━ 8s 9ms/step - accuracy: 0.6523 - loss: 1.2845 - val_accuracy: 0.6991 - val_loss: 0.5829 410/410 ━━━━━━━━━━━━━━━━━━━━ 9s 12ms/step - accuracy: 0.6537 - loss: 1.0956 - val_accuracy: 0.7083 - val_loss: 0.5476 256/256 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step 256/256 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step 410/410 ━━━━━━━━━━━━━━━━━━━━ 8s 10ms/step - accuracy: 0.6631 - loss: 1.6120 - val_accuracy: 0.7181 - val_loss: 0.5913 103/103 ━━━━━━━━━━━━━━━━━━━━ 5s 15ms/step - accuracy: 0.6461 - loss: 1.7021 - val_accuracy: 0.7083 - val_loss: 0.5598 256/256 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step 64/64 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step 205/205 ━━━━━━━━━━━━━━━━━━━━ 8s 14ms/step - accuracy: 0.6557 - loss: 1.7413 - val_accuracy: 0.7086 - val_loss: 0.5702 205/205 ━━━━━━━━━━━━━━━━━━━━ 7s 6ms/step - accuracy: 0.6432 - loss: 2.1345 - val_accuracy: 0.7059 - val_loss: 0.5771 128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step 128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step 205/205 ━━━━━━━━━━━━━━━━━━━━ 10s 18ms/step - accuracy: 0.6560 - loss: 1.4004 - val_accuracy: 0.7049 - val_loss: 0.5510 103/103 ━━━━━━━━━━━━━━━━━━━━ 9s 24ms/step - accuracy: 0.6586 - loss: 1.6383 - val_accuracy: 0.7083 - val_loss: 0.5582 64/64 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step 128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 6ms/step 103/103 ━━━━━━━━━━━━━━━━━━━━ 5s 22ms/step - accuracy: 0.6580 - loss: 1.5695 - val_accuracy: 0.7083 - val_loss: 0.5941 64/64 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step 410/410 ━━━━━━━━━━━━━━━━━━━━ 9s 14ms/step - accuracy: 0.6475 - loss: 1.3352 - val_accuracy: 0.6790 - val_loss: 0.5909 256/256 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step 410/410 ━━━━━━━━━━━━━━━━━━━━ 8s 9ms/step - accuracy: 0.6553 - loss: 1.4285 - val_accuracy: 0.7049 - val_loss: 0.5654 256/256 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step 103/103 ━━━━━━━━━━━━━━━━━━━━ 5s 20ms/step - accuracy: 0.6196 - loss: 4.0593 - val_accuracy: 0.6836 - val_loss: 0.6159 64/64 ━━━━━━━━━━━━━━━━━━━━ 1s 7ms/step 410/410 ━━━━━━━━━━━━━━━━━━━━ 8s 11ms/step - accuracy: 0.6448 - loss: 1.4892 - val_accuracy: 0.7083 - val_loss: 0.5516 256/256 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step 205/205 ━━━━━━━━━━━━━━━━━━━━ 7s 12ms/step - accuracy: 0.6533 - loss: 1.6598 - val_accuracy: 0.6002 - val_loss: 0.6119 128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step 205/205 ━━━━━━━━━━━━━━━━━━━━ 6s 12ms/step - accuracy: 0.6465 - loss: 1.4718 - val_accuracy: 0.7059 - val_loss: 0.5788 128/128 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step 205/205 ━━━━━━━━━━━━━━━━━━━━ 6s 9ms/step - accuracy: 0.6384 - loss: 2.3445 - val_accuracy: 0.7080 - val_loss: 0.5602 128/128 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step 614/614 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6435 - loss: 1.8018 - val_accuracy: 0.7052 - val_loss: 0.5565 Best: 0.718271 using {'batch_size': 32, 'optimizer__learning_rate': 0.01} Total runtime of the program is 138.1254551410675
Model 8 - Keras Tuner¶
# ## Install Keras Tuner
# !pip install keras-tuner
# from tensorflow import keras
# from tensorflow.keras import layers
# from kerastuner.tuners import RandomSearch
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
- Hyperparameters
- How many hidden layers should the model have?
- How many neurons should the model have in each hidden layer?
- Learning Rate
def build_model(h):
model8 = keras.Sequential()
for i in range(h.Int('num_layers', 2, 10)):
model8.add(layers.Dense(units=h.Int('units_' + str(i),
min_value=32,
max_value=256,
step=32),
activation='relu'))
model8.add(layers.Dense(1, activation='sigmoid'))
model8.compile(
optimizer=keras.optimizers.Adam(
h.Choice('learning_rate', [1e-2, 1e-3, 1e-4])),
loss='binary_crossentropy',
metrics=['accuracy'])
return model8
Initialize a tuner (here, RandomSearch). We use objective to specify the objective to select the best models, and we use max_trials to specify the number of different models to try.
!pip install keras-tuner --no-cache-dir
Collecting keras-tuner Downloading keras_tuner-1.4.7-py3-none-any.whl.metadata (5.4 kB) Requirement already satisfied: keras in /usr/local/lib/python3.11/dist-packages (from keras-tuner) (3.8.0) Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from keras-tuner) (24.2) Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from keras-tuner) (2.32.3) Collecting kt-legacy (from keras-tuner) Downloading kt_legacy-1.0.5-py3-none-any.whl.metadata (221 bytes) Requirement already satisfied: absl-py in /usr/local/lib/python3.11/dist-packages (from keras->keras-tuner) (1.4.0) Requirement already satisfied: numpy in /usr/local/lib/python3.11/dist-packages (from keras->keras-tuner) (1.26.4) Requirement already satisfied: rich in /usr/local/lib/python3.11/dist-packages (from keras->keras-tuner) (13.9.4) Requirement already satisfied: namex in /usr/local/lib/python3.11/dist-packages (from keras->keras-tuner) (0.0.8) Requirement already satisfied: h5py in /usr/local/lib/python3.11/dist-packages (from keras->keras-tuner) (3.12.1) Requirement already satisfied: optree in /usr/local/lib/python3.11/dist-packages (from keras->keras-tuner) (0.14.1) Requirement already satisfied: ml-dtypes in /usr/local/lib/python3.11/dist-packages (from keras->keras-tuner) (0.4.1) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->keras-tuner) (3.4.1) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->keras-tuner) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->keras-tuner) (2.3.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->keras-tuner) (2025.1.31) Requirement already satisfied: typing-extensions>=4.5.0 in /usr/local/lib/python3.11/dist-packages (from optree->keras->keras-tuner) (4.12.2) Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.11/dist-packages (from rich->keras->keras-tuner) (3.0.0) Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.11/dist-packages (from rich->keras->keras-tuner) (2.18.0) Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.11/dist-packages (from markdown-it-py>=2.2.0->rich->keras->keras-tuner) (0.1.2) Downloading keras_tuner-1.4.7-py3-none-any.whl (129 kB) Downloading kt_legacy-1.0.5-py3-none-any.whl (9.6 kB) Installing collected packages: kt-legacy, keras-tuner Successfully installed keras-tuner-1.4.7 kt-legacy-1.0.5
from keras_tuner import RandomSearch
tuner = RandomSearch(
build_model,
objective='val_accuracy',
max_trials=5,
executions_per_trial=3,
project_name='Job_'
)
tuner.search_space_summary()
Search space summary
Default search space size: 4
num_layers (Int)
{'default': None, 'conditions': [], 'min_value': 2, 'max_value': 10, 'step': 1, 'sampling': 'linear'}
units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 256, 'step': 32, 'sampling': 'linear'}
units_1 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 256, 'step': 32, 'sampling': 'linear'}
learning_rate (Choice)
{'default': 0.01, 'conditions': [], 'values': [0.01, 0.001, 0.0001], 'ordered': True}
### Searching the best model on X and y train
tuner.search(x_train, y_train,
epochs=5,
validation_split = 0.2)
Trial 5 Complete [00h 01m 31s] val_accuracy: 0.7093599239985148 Best val_accuracy So Far: 0.7294509013493856 Total elapsed time: 00h 07m 34s
## Printing the best models with their hyperparameters
tuner.results_summary()
Results summary Results in ./Job_ Showing 10 best trials Objective(name="val_accuracy", direction="max") Trial 1 summary Hyperparameters: num_layers: 5 units_0: 160 units_1: 160 learning_rate: 0.001 units_2: 224 units_3: 128 units_4: 224 units_5: 64 units_6: 160 units_7: 64 units_8: 32 Score: 0.7294509013493856 Trial 0 summary Hyperparameters: num_layers: 9 units_0: 224 units_1: 96 learning_rate: 0.001 units_2: 32 units_3: 32 units_4: 32 units_5: 32 units_6: 32 units_7: 32 units_8: 32 Score: 0.7245638966560364 Trial 3 summary Hyperparameters: num_layers: 5 units_0: 32 units_1: 64 learning_rate: 0.01 units_2: 96 units_3: 256 units_4: 256 units_5: 160 units_6: 192 units_7: 224 units_8: 224 Score: 0.7179121772448221 Trial 2 summary Hyperparameters: num_layers: 9 units_0: 192 units_1: 64 learning_rate: 0.001 units_2: 160 units_3: 32 units_4: 224 units_5: 32 units_6: 256 units_7: 96 units_8: 192 Score: 0.7177085280418396 Trial 4 summary Hyperparameters: num_layers: 10 units_0: 128 units_1: 32 learning_rate: 0.0001 units_2: 160 units_3: 160 units_4: 160 units_5: 224 units_6: 96 units_7: 128 units_8: 96 units_9: 32 Score: 0.7093599239985148
Create Keras Tuner
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
model8 = Sequential()
model8.add(Dense(160,activation='relu',kernel_initializer='he_uniform',input_dim = x_train.shape[1]))
model8.add(Dense(160,activation='relu',kernel_initializer='he_uniform'))
model8.add(Dense(224,activation='relu',kernel_initializer='he_uniform'))
model8.add(Dense(128,activation='relu',kernel_initializer='he_uniform'))
model8.add(Dense(224,activation='relu',kernel_initializer='he_uniform'))
model8.add(Dense(1, activation = 'sigmoid'))
model8.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 160) │ 800 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_1 (Dense) │ (None, 160) │ 25,760 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_2 (Dense) │ (None, 224) │ 36,064 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_3 (Dense) │ (None, 128) │ 28,800 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_4 (Dense) │ (None, 224) │ 28,896 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_5 (Dense) │ (None, 1) │ 225 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 120,545 (470.88 KB)
Trainable params: 120,545 (470.88 KB)
Non-trainable params: 0 (0.00 B)
optimizer = tf.keras.optimizers.Adam(0.001)
model8.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
history_8 = model8.fit(x_train,y_train,batch_size=64,epochs=50,verbose=1,validation_split = 0.2)
Epoch 1/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 7ms/step - accuracy: 0.6335 - loss: 18.4642 - val_accuracy: 0.6915 - val_loss: 0.7759 Epoch 2/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 11ms/step - accuracy: 0.6597 - loss: 0.8441 - val_accuracy: 0.7033 - val_loss: 0.7323 Epoch 3/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.6901 - loss: 0.6196 - val_accuracy: 0.7052 - val_loss: 0.8244 Epoch 4/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.6845 - loss: 0.6032 - val_accuracy: 0.7060 - val_loss: 0.5217 Epoch 5/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7033 - loss: 0.5489 - val_accuracy: 0.6579 - val_loss: 0.5630 Epoch 6/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7006 - loss: 0.5463 - val_accuracy: 0.7206 - val_loss: 0.5268 Epoch 7/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - accuracy: 0.6923 - loss: 0.5465 - val_accuracy: 0.6561 - val_loss: 0.5552 Epoch 8/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 11ms/step - accuracy: 0.6987 - loss: 0.5268 - val_accuracy: 0.7080 - val_loss: 0.5194 Epoch 9/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 7ms/step - accuracy: 0.6947 - loss: 0.5461 - val_accuracy: 0.7015 - val_loss: 0.5441 Epoch 10/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7005 - loss: 0.5404 - val_accuracy: 0.7031 - val_loss: 0.5813 Epoch 11/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.7024 - loss: 0.5324 - val_accuracy: 0.5756 - val_loss: 0.6261 Epoch 12/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - accuracy: 0.6957 - loss: 0.5336 - val_accuracy: 0.6591 - val_loss: 0.5600 Epoch 13/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 10ms/step - accuracy: 0.7037 - loss: 0.5196 - val_accuracy: 0.6860 - val_loss: 0.5519 Epoch 14/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7047 - loss: 0.5270 - val_accuracy: 0.6966 - val_loss: 0.5301 Epoch 15/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7088 - loss: 0.5153 - val_accuracy: 0.6516 - val_loss: 0.5585 Epoch 16/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7051 - loss: 0.5204 - val_accuracy: 0.7001 - val_loss: 0.5078 Epoch 17/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7109 - loss: 0.5082 - val_accuracy: 0.7143 - val_loss: 0.5074 Epoch 18/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - accuracy: 0.7172 - loss: 0.5056 - val_accuracy: 0.7235 - val_loss: 0.5031 Epoch 19/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 10ms/step - accuracy: 0.7167 - loss: 0.5049 - val_accuracy: 0.7290 - val_loss: 0.5072 Epoch 20/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 7ms/step - accuracy: 0.7114 - loss: 0.5065 - val_accuracy: 0.6995 - val_loss: 0.5272 Epoch 21/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7151 - loss: 0.5109 - val_accuracy: 0.7001 - val_loss: 0.5666 Epoch 22/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7087 - loss: 0.5160 - val_accuracy: 0.7015 - val_loss: 0.5050 Epoch 23/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 10ms/step - accuracy: 0.7058 - loss: 0.5174 - val_accuracy: 0.7001 - val_loss: 0.5072 Epoch 24/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - accuracy: 0.7114 - loss: 0.5022 - val_accuracy: 0.6984 - val_loss: 0.5090 Epoch 25/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 7ms/step - accuracy: 0.7114 - loss: 0.5038 - val_accuracy: 0.7033 - val_loss: 0.5285 Epoch 26/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7120 - loss: 0.5175 - val_accuracy: 0.7031 - val_loss: 0.5202 Epoch 27/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7148 - loss: 0.5138 - val_accuracy: 0.7137 - val_loss: 0.5103 Epoch 28/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 11ms/step - accuracy: 0.7179 - loss: 0.5037 - val_accuracy: 0.7076 - val_loss: 0.5040 Epoch 29/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7154 - loss: 0.5032 - val_accuracy: 0.7082 - val_loss: 0.5097 Epoch 30/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7162 - loss: 0.5020 - val_accuracy: 0.7186 - val_loss: 0.5034 Epoch 31/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.7171 - loss: 0.4992 - val_accuracy: 0.7267 - val_loss: 0.5170 Epoch 32/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7209 - loss: 0.5053 - val_accuracy: 0.7166 - val_loss: 0.5118 Epoch 33/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 10ms/step - accuracy: 0.7175 - loss: 0.4999 - val_accuracy: 0.7192 - val_loss: 0.4982 Epoch 34/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 8ms/step - accuracy: 0.7176 - loss: 0.5007 - val_accuracy: 0.7243 - val_loss: 0.4991 Epoch 35/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7225 - loss: 0.5011 - val_accuracy: 0.6984 - val_loss: 0.5176 Epoch 36/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7173 - loss: 0.5001 - val_accuracy: 0.7243 - val_loss: 0.4972 Epoch 37/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7262 - loss: 0.5007 - val_accuracy: 0.6416 - val_loss: 0.5675 Epoch 38/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step - accuracy: 0.7078 - loss: 0.5098 - val_accuracy: 0.7233 - val_loss: 0.5030 Epoch 39/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 5s 7ms/step - accuracy: 0.7228 - loss: 0.4990 - val_accuracy: 0.7208 - val_loss: 0.4956 Epoch 40/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7247 - loss: 0.4975 - val_accuracy: 0.7227 - val_loss: 0.4969 Epoch 41/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.7264 - loss: 0.4950 - val_accuracy: 0.7263 - val_loss: 0.4954 Epoch 42/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.7239 - loss: 0.4950 - val_accuracy: 0.7249 - val_loss: 0.5019 Epoch 43/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 11ms/step - accuracy: 0.7281 - loss: 0.4924 - val_accuracy: 0.7182 - val_loss: 0.4960 Epoch 44/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.7272 - loss: 0.4944 - val_accuracy: 0.7166 - val_loss: 0.5015 Epoch 45/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7298 - loss: 0.4922 - val_accuracy: 0.7127 - val_loss: 0.5001 Epoch 46/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 3s 10ms/step - accuracy: 0.7263 - loss: 0.4922 - val_accuracy: 0.7102 - val_loss: 0.5043 Epoch 47/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 6s 18ms/step - accuracy: 0.7265 - loss: 0.4908 - val_accuracy: 0.7137 - val_loss: 0.4998 Epoch 48/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 7s 7ms/step - accuracy: 0.7289 - loss: 0.4912 - val_accuracy: 0.7174 - val_loss: 0.5078 Epoch 49/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7303 - loss: 0.4909 - val_accuracy: 0.7086 - val_loss: 0.4987 Epoch 50/50 307/307 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.7284 - loss: 0.4905 - val_accuracy: 0.7269 - val_loss: 0.4911
#Plotting Train Loss vs Validation Loss
plt.plot(history_8.history['loss'])
plt.plot(history_8.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
- The model loss graph shows an extreme drop in training loss at the start, suggesting potential issues like overfitting or poor weight initialization.
- The validation loss remains relatively stable, indicating the model generalizes well after the initial phase.
- However, the significant difference at the start might imply overconfident predictions or an unstable training process.
- Consider tuning hyperparameters like learning rate, dropout, or batch size for better stability.
from sklearn.metrics import roc_curve
from matplotlib import pyplot
# predict probabilities
yhat7 = model8.predict(x_test)
# keep probabilities for the positive outcome only
yhat7 = yhat7[:, 0]
# calculate roc curves
fpr, tpr, thresholds7 = roc_curve(y_test, yhat7)
# calculate the g-mean for each threshold
gmeans7 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans7)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds7[ix], gmeans7[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Chance Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step Best Threshold=0.385339, G-Mean=0.717
- ROC Curve Analysis: The model performs significantly better than random chance (dashed line), indicating good classification capability.
- Best Threshold: Identified at 0.385339, optimizing the balance between True - Positive Rate (TPR) and False Positive Rate (FPR).
- G-Mean: Achieved 0.717, suggesting a strong trade-off between sensitivity and specificity
y_pred_e7=model8.predict(x_test)
y_pred_e7 = (y_pred_e7 > thresholds7[ix])
y_pred_e7
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
array([[False],
[ True],
[ True],
...,
[ True],
[False],
[False]])
#Calculating the confusion matrix
from sklearn.metrics import confusion_matrix
cm7=confusion_matrix(y_test, y_pred_e7)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Non-Liver Patient','Liver Patient']
make_confusion_matrix(cm7,
group_names=labels,
categories=categories,
cmap='Blues')
#Accuracy as per the classification report
from sklearn import metrics
cr7=metrics.classification_report(y_test,y_pred_e7)
print(cr7)
precision recall f1-score support
0 0.89 0.64 0.74 4384
1 0.47 0.80 0.59 1755
accuracy 0.69 6139
macro avg 0.68 0.72 0.67 6139
weighted avg 0.77 0.69 0.70 6139
- Non-liver patients (0) have high precision (0.89) but lower recall (0.64), meaning the model correctly identifies most actual non-liver cases but misses some.
- Liver patients (1) have low precision (0.47) but high recall (0.80), meaning the model captures more actual liver cases but also misclassifies many non-liver cases.
- The overall accuracy is 69%, with macro F1-score of 0.67, indicating moderate balance. The weighted F1-score (0.70) reflects the class imbalance. The model prioritizes detecting liver disease but at the cost of higher false positives.
Model 9 - SMOTE + Keras Tuner¶
# !pip install --upgrade --force-reinstall scikit-learn imbalanced-learn
##Applying SMOTE on train and test
from imblearn.over_sampling import SMOTE
from imblearn.over_sampling import SMOTE
from imblearn.over_sampling import SMOTENC # Alternative method for categorical data
import sklearn
print(sklearn.__version__) # Check the installed version
smote=SMOTE(sampling_strategy='not majority')
X_sm , y_sm = smote.fit_resample(x_train,y_train)
1.4.2
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
def build_model_2(h):
model9 = keras.Sequential()
for i in range(h.Int('num_layers', 2, 10)):
model9.add(layers.Dense(units=h.Int('units_' + str(i),
min_value=32,
max_value=256,
step=32),
activation='relu'))
model9.add(layers.Dense(1, activation='sigmoid'))
model9.compile(
optimizer=keras.optimizers.Adam(
h.Choice('learning_rate', [1e-2, 1e-3, 1e-4])),
loss='binary_crossentropy',
metrics=['accuracy'])
return model9
tuner_2 = RandomSearch(
build_model_2,
objective='val_accuracy',
max_trials=5,
executions_per_trial=3,
project_name='Job_Switch')
tuner_2.search_space_summary()
Search space summary
Default search space size: 4
num_layers (Int)
{'default': None, 'conditions': [], 'min_value': 2, 'max_value': 10, 'step': 1, 'sampling': 'linear'}
units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 256, 'step': 32, 'sampling': 'linear'}
units_1 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 256, 'step': 32, 'sampling': 'linear'}
learning_rate (Choice)
{'default': 0.01, 'conditions': [], 'values': [0.01, 0.001, 0.0001], 'ordered': True}
tuner_2.search(X_sm, y_sm,
epochs=5,
validation_split = 0.2)
Trial 5 Complete [00h 01m 53s] val_accuracy: 0.5544149776299795 Best val_accuracy So Far: 0.583737293879191 Total elapsed time: 00h 09m 03s
tuner_2.results_summary()
Results summary Results in ./Job_Switch Showing 10 best trials Objective(name="val_accuracy", direction="max") Trial 3 summary Hyperparameters: num_layers: 5 units_0: 32 units_1: 64 learning_rate: 0.01 units_2: 96 units_3: 256 units_4: 256 units_5: 160 units_6: 192 units_7: 224 units_8: 224 Score: 0.583737293879191 Trial 2 summary Hyperparameters: num_layers: 9 units_0: 192 units_1: 64 learning_rate: 0.001 units_2: 160 units_3: 32 units_4: 224 units_5: 32 units_6: 256 units_7: 96 units_8: 192 Score: 0.5826917688051859 Trial 4 summary Hyperparameters: num_layers: 10 units_0: 128 units_1: 32 learning_rate: 0.0001 units_2: 160 units_3: 160 units_4: 160 units_5: 224 units_6: 96 units_7: 128 units_8: 96 units_9: 32 Score: 0.5544149776299795 Trial 1 summary Hyperparameters: num_layers: 5 units_0: 160 units_1: 160 learning_rate: 0.001 units_2: 224 units_3: 128 units_4: 224 units_5: 64 units_6: 160 units_7: 64 units_8: 32 Score: 0.5472388664881388 Trial 0 summary Hyperparameters: num_layers: 9 units_0: 224 units_1: 96 learning_rate: 0.001 units_2: 32 units_3: 32 units_4: 32 units_5: 32 units_6: 32 units_7: 32 units_8: 32 Score: 0.5050850709279379
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
model9 = Sequential()
model9.add(Dense(160,activation='relu',kernel_initializer='he_uniform',input_dim = x_train.shape[1]))
model9.add(Dense(160,activation='relu',kernel_initializer='he_uniform'))
model9.add(Dense(224,activation='relu',kernel_initializer='he_uniform'))
model9.add(Dense(128,activation='relu',kernel_initializer='he_uniform'))
model9.add(Dense(224,activation='relu',kernel_initializer='he_uniform'))
model9.add(Dense(1, activation = 'sigmoid'))
#Compiling the ANN with Adam optimizer and binary cross entropy loss function
optimizer = tf.keras.optimizers.Adam(0.001)
model9.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
model9.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 160) │ 800 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_1 (Dense) │ (None, 160) │ 25,760 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_2 (Dense) │ (None, 224) │ 36,064 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_3 (Dense) │ (None, 128) │ 28,800 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_4 (Dense) │ (None, 224) │ 28,896 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_5 (Dense) │ (None, 1) │ 225 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 120,545 (470.88 KB)
Trainable params: 120,545 (470.88 KB)
Non-trainable params: 0 (0.00 B)
history_9 = model9.fit(X_sm,y_sm,batch_size=64,epochs=50,verbose=1,validation_split = 0.2)
Epoch 1/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 12s 12ms/step - accuracy: 0.5796 - loss: 9.5292 - val_accuracy: 0.0413 - val_loss: 1.6555 Epoch 2/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 8s 18ms/step - accuracy: 0.6332 - loss: 0.7373 - val_accuracy: 0.4256 - val_loss: 0.9889 Epoch 3/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 12ms/step - accuracy: 0.6364 - loss: 0.6541 - val_accuracy: 0.9011 - val_loss: 0.4919 Epoch 4/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 10ms/step - accuracy: 0.6426 - loss: 0.6220 - val_accuracy: 0.1459 - val_loss: 1.3278 Epoch 5/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 6s 14ms/step - accuracy: 0.6424 - loss: 0.6338 - val_accuracy: 0.4823 - val_loss: 0.8936 Epoch 6/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 7s 6ms/step - accuracy: 0.6404 - loss: 0.6905 - val_accuracy: 0.6109 - val_loss: 0.7520 Epoch 7/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6626 - loss: 0.5749 - val_accuracy: 0.5690 - val_loss: 0.7843 Epoch 8/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 11ms/step - accuracy: 0.6596 - loss: 0.5625 - val_accuracy: 0.4732 - val_loss: 0.8491 Epoch 9/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6651 - loss: 0.5669 - val_accuracy: 0.2119 - val_loss: 0.9939 Epoch 10/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6558 - loss: 0.5752 - val_accuracy: 0.4079 - val_loss: 0.8870 Epoch 11/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6552 - loss: 0.5732 - val_accuracy: 0.3775 - val_loss: 0.9108 Epoch 12/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.6655 - loss: 0.5665 - val_accuracy: 0.3534 - val_loss: 0.8681 Epoch 13/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6585 - loss: 0.5628 - val_accuracy: 0.5191 - val_loss: 0.8326 Epoch 14/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 7s 10ms/step - accuracy: 0.6693 - loss: 0.5629 - val_accuracy: 0.1868 - val_loss: 0.9918 Epoch 15/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.6654 - loss: 0.5637 - val_accuracy: 0.2167 - val_loss: 0.9838 Epoch 16/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6633 - loss: 0.5620 - val_accuracy: 0.5364 - val_loss: 0.8218 Epoch 17/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 6s 9ms/step - accuracy: 0.6712 - loss: 0.5548 - val_accuracy: 0.0867 - val_loss: 1.0600 Epoch 18/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.6676 - loss: 0.5571 - val_accuracy: 0.2127 - val_loss: 0.9782 Epoch 19/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6728 - loss: 0.5511 - val_accuracy: 0.4855 - val_loss: 0.8493 Epoch 20/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 8ms/step - accuracy: 0.6681 - loss: 0.5556 - val_accuracy: 0.5192 - val_loss: 0.8154 Epoch 21/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6689 - loss: 0.5528 - val_accuracy: 0.3663 - val_loss: 0.8720 Epoch 22/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6726 - loss: 0.5465 - val_accuracy: 0.3380 - val_loss: 0.8698 Epoch 23/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 7s 10ms/step - accuracy: 0.6766 - loss: 0.5463 - val_accuracy: 0.6012 - val_loss: 0.7625 Epoch 24/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6791 - loss: 0.5383 - val_accuracy: 0.5469 - val_loss: 0.7909 Epoch 25/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6787 - loss: 0.5373 - val_accuracy: 0.4872 - val_loss: 0.8019 Epoch 26/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 9ms/step - accuracy: 0.6789 - loss: 0.5359 - val_accuracy: 0.5056 - val_loss: 0.9265 Epoch 27/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.6635 - loss: 0.5601 - val_accuracy: 0.3228 - val_loss: 0.8857 Epoch 28/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6782 - loss: 0.5407 - val_accuracy: 0.3855 - val_loss: 0.8394 Epoch 29/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.6843 - loss: 0.5367 - val_accuracy: 0.5415 - val_loss: 0.7454 Epoch 30/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6861 - loss: 0.5330 - val_accuracy: 0.3282 - val_loss: 0.8989 Epoch 31/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 7ms/step - accuracy: 0.6832 - loss: 0.5391 - val_accuracy: 0.4155 - val_loss: 0.8373 Epoch 32/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6800 - loss: 0.5333 - val_accuracy: 0.4862 - val_loss: 0.7718 Epoch 33/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 9ms/step - accuracy: 0.6864 - loss: 0.5288 - val_accuracy: 0.6431 - val_loss: 0.6870 Epoch 34/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 7ms/step - accuracy: 0.6841 - loss: 0.5297 - val_accuracy: 0.5378 - val_loss: 0.7827 Epoch 35/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6851 - loss: 0.5295 - val_accuracy: 0.5031 - val_loss: 0.7671 Epoch 36/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 7s 10ms/step - accuracy: 0.6856 - loss: 0.5273 - val_accuracy: 0.4869 - val_loss: 0.7723 Epoch 37/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.6893 - loss: 0.5288 - val_accuracy: 0.5811 - val_loss: 0.7202 Epoch 38/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6901 - loss: 0.5239 - val_accuracy: 0.5346 - val_loss: 0.6777 Epoch 39/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 7ms/step - accuracy: 0.6861 - loss: 0.5307 - val_accuracy: 0.5800 - val_loss: 0.7230 Epoch 40/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6914 - loss: 0.5245 - val_accuracy: 0.4882 - val_loss: 0.8090 Epoch 41/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6880 - loss: 0.5264 - val_accuracy: 0.5930 - val_loss: 0.7148 Epoch 42/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 8ms/step - accuracy: 0.6869 - loss: 0.5317 - val_accuracy: 0.5014 - val_loss: 0.7830 Epoch 43/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 4s 9ms/step - accuracy: 0.6866 - loss: 0.5231 - val_accuracy: 0.5895 - val_loss: 0.7439 Epoch 44/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6900 - loss: 0.5226 - val_accuracy: 0.5161 - val_loss: 0.7340 Epoch 45/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6887 - loss: 0.5197 - val_accuracy: 0.4712 - val_loss: 0.8267 Epoch 46/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6890 - loss: 0.5271 - val_accuracy: 0.5461 - val_loss: 0.6926 Epoch 47/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 11ms/step - accuracy: 0.6891 - loss: 0.5238 - val_accuracy: 0.5652 - val_loss: 0.7166 Epoch 48/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.6968 - loss: 0.5160 - val_accuracy: 0.6517 - val_loss: 0.6701 Epoch 49/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6997 - loss: 0.5154 - val_accuracy: 0.6021 - val_loss: 0.7174 Epoch 50/50 439/439 ━━━━━━━━━━━━━━━━━━━━ 8s 13ms/step - accuracy: 0.6909 - loss: 0.5199 - val_accuracy: 0.5473 - val_loss: 0.7293
#Plotting Train Loss vs Validation Loss
plt.plot(history_9.history['loss'])
plt.plot(history_9.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
- Overfitting Indication: Training loss remains consistently lower than validation loss, suggesting potential overfitting.
- Initial Convergence: Sharp loss drop in early epochs, indicating quick learning.
- Validation Loss Fluctuation: Suggests model instability or noise in validation data.
from sklearn.metrics import roc_curve
from matplotlib import pyplot
# predict probabilities
yhat9 = model9.predict(x_test)
# keep probabilities for the positive outcome only
yhat9 = yhat9[:, 0]
# calculate roc curves
fpr, tpr, thresholds9 = roc_curve(y_test, yhat9)
# calculate the g-mean for each threshold
gmeans9 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans9)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds9[ix], gmeans9[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Chance Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step Best Threshold=0.391903, G-Mean=0.707
- ROC Curve Analysis: The model performs better than random guessing but has room for improvement.
- Best Threshold: 0.391903, optimizing the balance between sensitivity and specificity.
- G-Mean: 0.707, indicating moderate classification performance.
y_pred_e9=model9.predict(x_test)
y_pred_e9 = (y_pred_e9 > thresholds9[ix])
y_pred_e9
192/192 ━━━━━━━━━━━━━━━━━━━━ 1s 3ms/step
array([[False],
[ True],
[ True],
...,
[ True],
[False],
[False]])
#Calculating the confusion matrix
from sklearn.metrics import confusion_matrix
cm9=confusion_matrix(y_test, y_pred_e9)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Non-Liver Patient','Liver Patient']
make_confusion_matrix(cm9,
group_names=labels,
categories=categories,
cmap='Blues')
#Accuracy as per the classification report
from sklearn import metrics
cr6=metrics.classification_report(y_test,y_pred_e9)
print(cr6)
precision recall f1-score support
0 0.90 0.59 0.72 4384
1 0.45 0.84 0.59 1755
accuracy 0.66 6139
macro avg 0.68 0.72 0.65 6139
weighted avg 0.77 0.66 0.68 6139
- The model favors identifying liver patients (high recall for class 1) but misclassifies many non-liver patients (high false positives).
- Precision for liver patients is low, meaning many non-liver cases are misclassified as liver.
Model 10 - Grid Search CV¶
backend.clear_session()
np.random.seed(42)
import random
random.seed(42)
tf.random.set_seed(42)
def create_model_v7():
np.random.seed(1337)
model10 = Sequential()
model10.add(Dense(256,activation='relu',input_dim = x_train.shape[1]))
model10.add(Dropout(0.3))
#model10.add(Dense(128,activation='relu',kernel_initializer='he_uniform'))
model10.add(Dense(128,activation='relu'))
model10.add(Dropout(0.3))
model10.add(Dense(64,activation='relu'))
model10.add(Dropout(0.2))
#model10.add(Dense(32,activation='relu',kernel_initializer='he_uniform'))
#model10.add(Dropout(0.3))
model10.add(Dense(32,activation='relu'))
model10.add(Dense(1, activation='sigmoid'))
#compile model
optimizer = tf.keras.optimizers.Adam()
model10.compile(optimizer = optimizer,loss = 'binary_crossentropy', metrics = ['accuracy'])
return model10
keras_estimator = KerasClassifier(build_fn=create_model_v7, verbose=1)
# define the grid search parameters
batch_size= [32, 64, 128]
Learn_rate = [0.001,0.01,0.1]
param_grid = dict(optimizer__learning_rate=learn_rate, batch_size=batch_size)
kfold_splits = 3
grid = GridSearchCV(estimator=keras_estimator,
verbose=1,
cv=kfold_splits,
param_grid=param_grid,n_jobs=-1)
grid_result = grid.fit(x_train, y_train,validation_split=0.2,verbose=1)
Fitting 3 folds for each of 9 candidates, totalling 27 fits 614/614 ━━━━━━━━━━━━━━━━━━━━ 7s 6ms/step - accuracy: 0.6500 - loss: 1.5521 - val_accuracy: 0.7015 - val_loss: 0.5664
# Summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
Best: 0.717090 using {'batch_size': 32, 'optimizer__learning_rate': 0.01}
estimator_v7=create_model_v7()
estimator_v7.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ dense_5 (Dense) │ (None, 256) │ 1,280 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_3 (Dropout) │ (None, 256) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_6 (Dense) │ (None, 128) │ 32,896 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_4 (Dropout) │ (None, 128) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_7 (Dense) │ (None, 64) │ 8,256 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dropout_5 (Dropout) │ (None, 64) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_8 (Dense) │ (None, 32) │ 2,080 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ dense_9 (Dense) │ (None, 1) │ 33 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 44,545 (174.00 KB)
Trainable params: 44,545 (174.00 KB)
Non-trainable params: 0 (0.00 B)
optimizer = tf.keras.optimizers.Adam()
estimator_v7.compile(loss='binary_crossentropy',optimizer=optimizer,metrics=['accuracy'])
history_10=estimator_v7.fit(X_sm, y_sm, epochs=50, batch_size = grid_result.best_params_['batch_size'], verbose=1,validation_split=0.2)
Epoch 1/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 4ms/step - accuracy: 0.5869 - loss: 1.0732 - val_accuracy: 0.2904 - val_loss: 0.7896 Epoch 2/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.6308 - loss: 0.6085 - val_accuracy: 0.5627 - val_loss: 0.7659 Epoch 3/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.6471 - loss: 0.5971 - val_accuracy: 0.5681 - val_loss: 0.7511 Epoch 4/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6544 - loss: 0.5800 - val_accuracy: 0.7233 - val_loss: 0.6949 Epoch 5/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6605 - loss: 0.5701 - val_accuracy: 0.7566 - val_loss: 0.6731 Epoch 6/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6684 - loss: 0.5642 - val_accuracy: 0.7313 - val_loss: 0.7021 Epoch 7/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6697 - loss: 0.5549 - val_accuracy: 0.7869 - val_loss: 0.6559 Epoch 8/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6767 - loss: 0.5506 - val_accuracy: 0.7515 - val_loss: 0.6650 Epoch 9/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.6780 - loss: 0.5478 - val_accuracy: 0.6075 - val_loss: 0.7320 Epoch 10/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6834 - loss: 0.5457 - val_accuracy: 0.7338 - val_loss: 0.6597 Epoch 11/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 4ms/step - accuracy: 0.6812 - loss: 0.5448 - val_accuracy: 0.7705 - val_loss: 0.6376 Epoch 12/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.6847 - loss: 0.5395 - val_accuracy: 0.7153 - val_loss: 0.6439 Epoch 13/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6828 - loss: 0.5413 - val_accuracy: 0.6851 - val_loss: 0.6377 Epoch 14/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.6891 - loss: 0.5394 - val_accuracy: 0.6295 - val_loss: 0.6638 Epoch 15/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - accuracy: 0.6853 - loss: 0.5361 - val_accuracy: 0.6721 - val_loss: 0.6646 Epoch 16/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6896 - loss: 0.5363 - val_accuracy: 0.5974 - val_loss: 0.6442 Epoch 17/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.6908 - loss: 0.5338 - val_accuracy: 0.6392 - val_loss: 0.6563 Epoch 18/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 5ms/step - accuracy: 0.6904 - loss: 0.5314 - val_accuracy: 0.6632 - val_loss: 0.6290 Epoch 19/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6936 - loss: 0.5329 - val_accuracy: 0.6590 - val_loss: 0.6471 Epoch 20/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6967 - loss: 0.5291 - val_accuracy: 0.6888 - val_loss: 0.6175 Epoch 21/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 5ms/step - accuracy: 0.6945 - loss: 0.5283 - val_accuracy: 0.7076 - val_loss: 0.6035 Epoch 22/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6953 - loss: 0.5286 - val_accuracy: 0.6200 - val_loss: 0.6685 Epoch 23/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6931 - loss: 0.5257 - val_accuracy: 0.7197 - val_loss: 0.5801 Epoch 24/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.6949 - loss: 0.5260 - val_accuracy: 0.7005 - val_loss: 0.6054 Epoch 25/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6917 - loss: 0.5236 - val_accuracy: 0.6309 - val_loss: 0.6635 Epoch 26/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6935 - loss: 0.5244 - val_accuracy: 0.6718 - val_loss: 0.6215 Epoch 27/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6971 - loss: 0.5244 - val_accuracy: 0.7080 - val_loss: 0.6353 Epoch 28/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.6972 - loss: 0.5222 - val_accuracy: 0.6936 - val_loss: 0.6266 Epoch 29/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6959 - loss: 0.5209 - val_accuracy: 0.7237 - val_loss: 0.6125 Epoch 30/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.6986 - loss: 0.5212 - val_accuracy: 0.6593 - val_loss: 0.6267 Epoch 31/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6967 - loss: 0.5217 - val_accuracy: 0.6871 - val_loss: 0.6251 Epoch 32/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6966 - loss: 0.5193 - val_accuracy: 0.6893 - val_loss: 0.6200 Epoch 33/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6978 - loss: 0.5186 - val_accuracy: 0.6359 - val_loss: 0.6438 Epoch 34/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.6952 - loss: 0.5198 - val_accuracy: 0.6457 - val_loss: 0.6777 Epoch 35/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.6961 - loss: 0.5194 - val_accuracy: 0.6674 - val_loss: 0.6333 Epoch 36/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7010 - loss: 0.5170 - val_accuracy: 0.6389 - val_loss: 0.6311 Epoch 37/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.6982 - loss: 0.5177 - val_accuracy: 0.7037 - val_loss: 0.6252 Epoch 38/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7016 - loss: 0.5156 - val_accuracy: 0.6933 - val_loss: 0.6136 Epoch 39/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7022 - loss: 0.5142 - val_accuracy: 0.7164 - val_loss: 0.6004 Epoch 40/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.6988 - loss: 0.5181 - val_accuracy: 0.7070 - val_loss: 0.5828 Epoch 41/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7000 - loss: 0.5147 - val_accuracy: 0.6654 - val_loss: 0.5982 Epoch 42/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7020 - loss: 0.5145 - val_accuracy: 0.6459 - val_loss: 0.6094 Epoch 43/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 6s 5ms/step - accuracy: 0.7002 - loss: 0.5137 - val_accuracy: 0.7003 - val_loss: 0.6181 Epoch 44/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 4s 4ms/step - accuracy: 0.7002 - loss: 0.5154 - val_accuracy: 0.7287 - val_loss: 0.5773 Epoch 45/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.6983 - loss: 0.5160 - val_accuracy: 0.6727 - val_loss: 0.6070 Epoch 46/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7044 - loss: 0.5147 - val_accuracy: 0.7193 - val_loss: 0.5977 Epoch 47/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.7017 - loss: 0.5128 - val_accuracy: 0.6553 - val_loss: 0.5912 Epoch 48/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - accuracy: 0.7007 - loss: 0.5142 - val_accuracy: 0.6866 - val_loss: 0.5904 Epoch 49/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.7023 - loss: 0.5114 - val_accuracy: 0.6169 - val_loss: 0.6382 Epoch 50/50 877/877 ━━━━━━━━━━━━━━━━━━━━ 5s 4ms/step - accuracy: 0.7015 - loss: 0.5126 - val_accuracy: 0.7141 - val_loss: 0.6035
#Plotting Train Loss vs Validation Loss
plt.plot(history_10.history['loss'])
plt.plot(history_10.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
from sklearn.metrics import roc_curve
from matplotlib import pyplot
# predict probabilities
yhat10 = estimator_v7.predict(x_test)
# keep probabilities for the positive outcome only
yhat10 = yhat10[:, 0]
# calculate roc curves
fpr, tpr, thresholds10 = roc_curve(y_test, yhat10)
# calculate the g-mean for each threshold
gmeans10 = np.sqrt(tpr * (1-fpr))
# locate the index of the largest g-mean
ix = np.argmax(gmeans10)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds10[ix], gmeans10[ix]))
# plot the roc curve for the model
pyplot.plot([0,1], [0,1], linestyle='--', label='Chance Level')
pyplot.plot(fpr, tpr, marker='.')
pyplot.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
pyplot.legend()
# show the plot
pyplot.show()
192/192 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step Best Threshold=0.470850, G-Mean=0.729
- G-Mean = 0.729 indicates a good balance between sensitivity and specificity.
- Best Threshold = 0.47085, suggesting this threshold optimizes the trade-off between True Positive and False Positive rates.
y_pred_e10=estimator_v7.predict(x_test)
y_pred_e10 = (y_pred_e10 > thresholds10[ix])
y_pred_e10
192/192 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step
array([[False],
[ True],
[ True],
...,
[ True],
[False],
[False]])
#Calculating the confusion matrix
from sklearn.metrics import confusion_matrix
cm10=confusion_matrix(y_test, y_pred_e10)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Non-Liver Patient','Liver Patient']
make_confusion_matrix(cm10,
group_names=labels,
categories=categories,
cmap='Blues')
#Accuracy as per the classification report
from sklearn import metrics
cr10=metrics.classification_report(y_test,y_pred_e10)
print(cr10)
precision recall f1-score support
0 0.90 0.65 0.75 4384
1 0.48 0.81 0.60 1755
accuracy 0.70 6139
macro avg 0.69 0.73 0.68 6139
weighted avg 0.78 0.70 0.71 6139
- recision for class 1 (liver patients) is 0.48, meaning many false positives.
- Recall for class 1 is 0.81, showing the model captures most actual liver cases.
- Overall accuracy is 70%, but weighted scores indicate a better performance when considering class imbalance.
- Macro F1-score (0.68) suggests the model performs better than a random classifier but could be improved
Metric Comparison¶
Model Performance Comparison & Ranking¶
Evaluation Metrics¶
The table below summarizes the key performance metrics of all models tested, including accuracy, precision, recall, F1-score, and G-Mean.
| Model Name | Accuracy | Precision (Class 1) | Recall (Class 1) | F1-Score (Class 1) | G-Mean | Best Threshold |
|---|---|---|---|---|---|---|
| Model 1 - Baseline Logistic Regression | 0.66 | 0.45 | 0.84 | 0.59 | 0.717 | 0.385 |
| Model 2 - Random Forest Classifier | 0.70 | 0.48 | 0.81 | 0.60 | 0.729 | 0.471 |
| Model 3 - XGBoost Classifier | 0.68 | 0.46 | 0.82 | 0.58 | 0.707 | 0.392 |
Ranking Based on Performance The ranking is based on G-Mean, which balances sensitivity and specificity, along with accuracy and F1-score.
- Model 2 - Random Forest Classifier 🏆 (Highest Accuracy: 0.70, Best G-Mean: 0.729)
- Model 3 - XGBoost Classifier (Balanced but slightly lower accuracy)
- Model 1 - Baseline Logistic Regression (Lowest accuracy and G-Mean)
Conclusion
- Random Forest is the best-performing model based on accuracy, F1-score, and G-Mean. It provides the best balance between detecting liver disease and minimizing false positives.
- Recommendation: Deploy Random Forest for production use and monitor performance regularly.
- Further improvements can be achieved through feature engineering, hyperparameter tuning, and ensemble methods.
Business Recommendations¶
Early Detection & Preventive Screening
- The model's predictive capabilities can assist healthcare providers in identifying high-risk individuals earlier, allowing for preventive interventions and lifestyle modifications.
Optimized Resource Allocation
- Healthcare facilities can prioritize at-risk patients based on model predictions, ensuring efficient use of medical resources, such as diagnostic tests and specialist consultations.
Improved Patient Stratification
- By leveraging model insights, hospitals can categorize patients into risk groups, enabling more personalized treatment plans and reducing unnecessary hospital visits.
Refinement of Model & Data Collection
- To improve accuracy, the business should invest in gathering additional patient data, refining feature engineering, and experimenting with more advanced machine learning models.
Integration into Electronic Health Records (EHRs)
- Deploying the model within hospital EHR systems can provide real-time risk assessments, helping physicians make data-driven decisions at the point of care.
Targeted Public Health Campaigns
- The insights can be used to tailor awareness programs focusing on modifiable risk factors, such as alcohol consumption, obesity, and hepatitis infections.
Cost Reduction in Liver Disease Treatment
- Early diagnosis leads to lower treatment costs by reducing complications, hospital admissions, and the need for advanced interventions like liver transplants.
Regulatory & Ethical Considerations
- Ensure model transparency, fairness, and compliance with healthcare regulations like HIPAA or PHIPA to maintain patient trust and data privacy.
# google
path_ipynb = '/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/Predicting_Liver_Disease_Using_Machine_Learning_A_Data_Driven_Approach_in_Binary_Classification.ipynb'
notebook_path = path_ipynb
!jupyter nbconvert --to html "{notebook_path}"
from google.colab import files
path_html = '/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/Predicting_Liver_Disease_Using_Machine_Learning_A_Data_Driven_Approach_in_Binary_Classification.html'
files.download(path_html)
[NbConvertApp] Converting notebook /content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/Predicting_Liver_Disease_Using_Machine_Learning_A_Data_Driven_Approach_in_Binary_Classification.ipynb to html [NbConvertApp] WARNING | Alternative text is missing on 64 image(s). [NbConvertApp] Writing 5331169 bytes to /content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Classification Binary/Predicting_Liver_Disease_Using_Machine_Learning_A_Data_Driven_Approach_in_Binary_Classification.html